1.1 Background and Motivation
Many practical reliable distributed systems do not rely on network synchrony because networks go through outages and periods of DDoS attacks; and because synchronous protocols have hard coded steps that delay system under all conditions the worst-case communication latency bound. Instead, asynchronous replication solutions via State Machine Replication (SMR) [DBLP:journals/csur/Schneider90] usually optimize for stability periods. This approach is modeled as partial synchrony [dwork1988consensus]. It allows for periods of asynchrony in which progress might be compromised, but consistency never does. This paradigm underlies most successful industrial solutions, for example, the Google Chubbie lock service [DBLP:conf/osdi/Burrows06], Yahoo’s ZooKeeper [DBLP:conf/usenix/HuntKJR10], etcd [etcd], Google’s Spanner [DBLP:journals/tocs/CorbettDEFFFGGHHHKKLLMMNQRRSSTWW13], Apache Cassandra [cassandra2014apache] and others.
In the partial synchrony settings, SMR solutions typically divide the execution into logical rounds (e.g., [lamport2001paxos, boyd2006randomized, castro1999practical]). In each round, progress depends on synchronizing all non-faulty nodes to overlap in the round for a sufficiently long period. For example, rounds can be used to designate a leader whose job is to drive progress in the protocol. Progress depends on a non-faulty leader emerging and all non-faulty nodes communicating with the leader in a timely manner. A round might not succeed in making progress due to a faulty leader or because nodes enter the round at different times. Nodes may give up on a bad round by timer expiration and advance to the next round.
If we only care about eventuality, the seminal work of Chandra and Toueg [chandra1996weakest, chandra1996unreliable] captures the abstract conditions under which progress is guaranteed. Specifically, they define a leader election abstraction, denoted , where eventually all non-faulty nodes trust the same non-faulty node as the leader. Indeed, was shown to be the weakest “failure detector” needed in order to solve consensus.
Whereas failure detector analysis focuses on possibility/impossibility, here we care about how quickly it takes for a good leader to emerge and at what communication cost.
We introduce the problem of view synchronization which is how to repeatedly bring all honest nodes to execute the same view for sufficiently long time, in the face of asynchrony and in the presence of faulty nodes. We define two performance measures, latency and communication complexity, both occurring in between two abstract view change events.
More concretely, a view change occurs as an interplay between the view synchronization protocol and the outer consensus solutions. The consensus solution must signal that it wishes to end the current view via a notification. The synchronizer eventually invokes a consensus signal to indicate when a new view starts. Latency and communication complexities are measured in between these two events.
In this paper, we tackle the view synchronization problem against asynchrony and the most severe type of faults, Byzantine [lamport1982byzantine, lamport1983weak]. This makes the synchronizers we develop particularly suited for Byzantine Fault Tolerance (BFT) consensus systems relevant in today’s crypto-economics systems.
More specifically, we assume a system of nodes that need to form a sequence of consensus decisions that implement SMR. We assume up to nodes are Byzantine, the upper bound on the number of Byzantine nodes in which Byzantine agreement is solvable [fischer1986easy]. The challenge is that during “happy” periods, progress might be made among a group of Byzantine nodes cooperating with a “fast” sub-group of the honest nodes. Indeed, many solutions advance when a leader succeeds in proposing a value to a quorum of nodes, but it is possible that only the “fast” honest nodes learn it and progress to the next round. The remaining “slow” honest nodes might stay behind, and may not even advance rounds (not only views) at all. Then at some point, the Byzantine nodes may stop cooperating. A mechanism is needed to bring the “slow” nodes to the same round as the “fast” ones.
Usually, the steps for synchronizing rounds are intertwined with the consensus protocol itself, leaving each distributed protocol to rethink of a solution and integrate it as part of the protocol itself. This often complicates protocol design and make it hard to evaluate and reason about. Often, liveness is only dealt with once and in eventuality. Indeed, in classic works such as [lamport2001paxos] which solve consensus, was part of the protocol, without discussing it as a separate problem or analyzing possible implementation in terms of latency and communication costs of such an implementation. In addition, was designed for a one-time consensus instance, and not SMR, which requires a recurring leader election. The result is that many protocols just provide eventual liveness, without taking into account constraints such as latency bounds and communication complexity. To the best of our knowledge, this paper is the first to formally define the problem of view synchronization and explore its characteristics as a separate problem.
We present three different algorithms to reach view synchronization and compare them in terms of latency (how long it takes to reach round synchronization), and communication complexity (how many messages need to be sent in the network overall). We name the module which reaches round synchronization a synchronizer, and look at higher-level protocols, such as consensus, which can utilize the synchronizer. Each protocol we present has tradeoffs compared with others, and we discuss these in the paper.
Some of the improvements embodied in these protocols may be valuable for a family of Byzantine SMR protocols, which include Zyzzyva [kotla2007zyzzyva], Casper [buterin2017casper], Tendermint [buchman2017tendermint, buchman2018tendermint], HotStuff [yin2019hotstuff], and LibraBFT [baudet2019librabft]. All these protocols state that in certain cases, e.g., as long as there is an honest leader, the protocol makes progress with linear communication and constant latency. Synchronizers such as the ones described in this paper may help find optimizations for these consensus protocols and broaden the cases in which they possess linear communication complexity and latency.
1.2 View synchronization algorithms
Our model assumes the eventual synchrony model, first introduced by DLS in [dwork1988consensus], it divides the execution to time of asynchrony, where messages between honest nodes are eventually delivered, but without a bound on arrival, and a global time named GST (Global Stabilization Time), after which the execution is synchronous, and messages arrive within a bounded time between honest nodes. More detail on the model is given in Section 2.
We introduce three synchronization protocols, which are described intuitively next.
View doubling synchronizer
The first protocol doubles the duration of each view, e.g., the second view will be twice as long as the first view. Eventually, there is guaranteed to be a sufficiently long time in which all the nodes are in the same view. This protocol does not require any message transmission between the nodes in the system, however, the time in which it takes to reach view synchronization is unbounded.
The idea of doubling the view duration is inspired from the PBFT consensus algorithm [castro1999practical], were each view has a timer for time, and if no progress is made an honest node tries to move to the next view and sets its timer to . Note that in PBFT though, messages do pass between nodes as part of the consensus protocol itself, and if the progress is reached, the timer of the view is reset to again.
The second algorithm we present is based on the Bracha reliable broadcast algorithm [bracha1987asynchronous]. This is a leaderless algorithm that after GST ensures that once nodes wish to enter the same view, all the honest nodes will enter the same view within a bounded time.
This protocol is based on reliable broadcast, which requires a quadratic number of messages to succeed, and has a constant latency.
Cogsworth: Leader-based synchronizer
The third protocol we describe is Cogsworth, which is leader-based. This protocol utilizes views that have an honest leader as a relay point for messages, instead of broadcasting them. Thus, when a node wishes to send a message across the network, it will only send it to the leader of the view. If the leader is honest, it will gather the messages from the nodes and multicast them using a threshold signature [boneh2001short, cachin2005random, shoup2000practical] to the rest of the nodes, with only linear communication costs, instead of the usual reliable broadcast, which requires a quadratic number of messages [bracha1987asynchronous]. The protocol itself, since it cannot rely on broadcast, has to deploy other measures to ensure that view synchronization is still achieved.
The latency and communication complexity of this algorithm depend on the number of consecutive Byzantine leaders after GST, and in the average case, the communication complexity is optimal linear and the latency is expected constant. The downside of this approach though is that in the worst-case of consecutive Byzantine leaders after GST, the latency is linear with expected quadratic communication costs.
In addition, if considering only benign failures, then the overall communication complexity is expected linear instead of optimal linear, while the latency is still expected constant.
|Latency||unbounded||Byzantine + Benign||expected|
A comparison of the three approaches in terms of latency and communication complexity is summarized in Tab. 1. See Section 3 for a formal description of how the two are computed. Note that all messages sent in all the protocols are constant size. In the table, is the number of actual Byzantine failures, and is the bound on message delivery after GST.
From the table, one can deduce that each protocol has its pros and cons. While a view doubling synchronizer sends 0 messages its latency until view synchronization is unbounded, and in contrast a broadcast-based synchronizer sends a quadratic number of messages but ensures constant latency.
Meanwhile, Cogsworth results depend on the ordering of the leaders, yielding expected constant latency, but worst-case linear, and linear optimal communication complexity with expected quadratic if facing Byzantine nodes. When facing benign failures, Cogsworth has expected linear communication costs with worst-case quadratic, and still expected constant latency.
The choice between the three protocols for an upper-layer protocol to use depends on the needs. E.g., if the upper-layer protocol is leader-based and progress is not guaranteed with a Byzantine leader, then the use of the reliable broadcast-based synchronizer might not be the right choice, since even though it will get view synchronization within constant time, the leader of the upper-layer protocol will not drive progress if it is Byzantine. Thus, a more natural choice will be Cogsworth. If there is a need to lower communication complexity, one might choose the view doubling synchronizer at a cost of potential high latency and so on.
The contributions of this paper as follows:
To the best of our knowledge, this is the first paper to formally define the problem of view synchronization as a separate problem in the distributed setting. This definition helps to bridge the gap between classic works on failure detectors for one-time consensus and an infinite run of an SMR consisted of multiple instances of single-shot consensus.
We describe two natural algorithms for view synchronization. The first is based on view duration doubling, and the second on all-to-all communication, i.e. reliable broadcast. This is the first time that these algorithms are described as a separate component to reach view synchronization, we prove their correctness and analyze the two in terms of communication complexity and latency until view synchronization is reached.
We introduce a third view synchronization protocol, Cogsworth, which is leader-based. Cogsworth exhibits optimal linear communication when facing Byzantine leaders and expected linear communication with benign failures. In both settings the expected latency of Cogsworth is constant. Like the previous two protocols, we do a full analysis of its correctness.
We compare the three view synchronization protocols and their tradeoffs from one another. This gives an upper-layer protocol a more informed choice to make when choosing what synchronizer to use.
The rest of this paper is structured as follows: Section 2 discusses the model; Section 3 formally defines the view synchronization problem; Section 4 defines the three view synchronization protocols with formal correctness proofs and comparison of their latency and communication cost; Section 5 describes related work; and Section 6 concludes the paper.
We follow the eventual synchronous model [dwork1988consensus] in which the execution is divided into two durations; first, an unbounded period of asynchrony, where messages do not have a bounded time until delivered; and then, a period of synchrony, where messages are delivered within a bounded time, denoted as
. The switch between the first and second periods occurs at a moment namedGlobal Stabilization Time (GST). We assume all messages sent before GST arrive at or before .
Our model consists of a set of nodes, and a known mapping, denoted by : . We use a cryptographic signing scheme, a public key infrastructure (PKI) to validate signatures, as well as a threshold signing scheme [boneh2001short, cachin2005random, shoup2000practical]. The threshold signing scheme is used in order to create a compact signature of -of- nodes and is used in other consensus protocols such as [cachin2005random]. Usually or .
We assume a non-adaptive adversary who can corrupt up to nodes at the beginning of the execution. This corruption is done without knowledge of the mapping . The set of remaining honest nodes is denoted . We assume the honest nodes may start their local execution at different times.
In addition, as in [abraham2019VABA, cachin2005random]
, we assume the adversary is polynomial-time bounded, i.e., the probability it will break the cryptographic assumptions in this paper (e.g., the cryptographic signatures, threshold signatures, etc.) is negligible.
3 Problem definition
We define a synchronizer, which solves the view synchronization problem, to be a long-lived task with an API that includes a operation and a signal, where . Nodes may repeatedly invoke , and in return get a possibly infinite sequence of signals. Informally, the synchronizer should be used by a high-level abstraction (e.g., BFT state-machine replication protocol) to synchronize view numbers in the following way: All nodes start in view , and whenever they wish to move to the next view they invoke . However, they move to view only when they get a signal.
Formally, a time interval consists of a starting time and an ending time and all the time points between them. ’s length is . We say if begins after or when begins, and ends before or when ends. We denote by the time when node gets the signal , and assume that all nodes get at the beginning of their execution. We denote time as the time when the last honest node began its execution, formally . We further denote as the time interval in which node is in view , i.e., begins at and ends at . We say node is at view at time , or executes view at time , if .
We are now ready to define the two properties that any synchronizer must achieve:
Property 1 (View synchronization).
For every there exists and an infinite number of time intervals and views , such that if every interval between two invocations by an honest node is , then and .
Property 2 (Synchronization validity).
A new view is proposed by the synchronizer only if some honest node wished to advance to it. Formally, the synchronizer signals only if there exists an honest node and some view s.t. calls at least times while executing view .
Latency and communication complexity
In order to define how the latency and message communication complexity are calculated, we first define to be the time at which the -th view synchronization is reached. Formally, , where is defined according to Property 1.
With this we can define the latency of a synchronizer implementation:
[Synchronizer latency] The latency of a synchronizer is defined as .
Next, in order to define communication complexity, we first need to introduce a few more notations. Let be the total number of messages sent between and . In addition, denote as the total number of messages sent by from the beginning of ’s execution and .
With this we define the communication complexity of a synchronizer implementation: [Synchronizer communication complexity] Denote the -th view in which view synchronization occurs (Property 1). The message communication cost of a synchronizer is defined as .
We now proceed to discuss three protocols for synchronizers.
4 Protocols for view synchronization
Next, we present the three view synchronization protocols discussed informally in Section 1.2. All protocol messages between nodes are signed and verified; for brevity, we omit the details about the cryptographic signatures. In the protocol, when a node collects messages from senders, it is implied that these messages carry distinct signatures.
4.1 View doubling synchronizer
A solution approach inspired by PBFT [castro1999practical] is to use view doubling as the view synchronization technique. In this approach, each view has a timer, and if no progress is made the node tries to move to the next view and doubles the timer time for the next view. Whenever progress is made, the node resets its timer. This approach is intertwined with the consensus protocol itself, making it hard to separate, as the messages of the consensus protocol are part of the mechanism used to reset the timer.
We adopt this approach and turn it into an independent synchronizer that requires no messages. Fist, the nodes need to agree on some predefined constant which is the duration of the first view. Next, there exists some global view duration mapping , which maps a view to its duration: . A node in a certain view must move to the next view once this duration passes, regardless of the outer protocol actions.
The view doubling protocol is described in Algorithm 1. A node starts at view (Algorithm 1) and a view duration of (Algorithm 1). Next, when is called, a counter named wish is incremented (Algorithm 1). This counter guarantees validity by moving to a view only when the wish counter reaches . Every time a view ends (Algorithm 1), an internal counter curr is incremented, and if the wish allows it, the synchronizer outputs with a new view .
We show that the view doubling protocol achieves the properties required by a synchronizer.
The view doubling protocol achieves view synchronization (creftypecap 1).
Since this protocol does not require sending messages between nodes, the Byzantine nodes cannot affect the behavior of the honest nodes, and we can treat all nodes as honest.
Recall that denotes the time by which all the honest nodes started their local execution of Algorithm 1. Let be the view at which node is at during . W.l.o.g assume at time . It follows from the definition of and the sum of a geometric series that
We begin by showing that for every the following condition holds: for any view . Let and . From the ordering of the node starting times, for all . We get:
Hence, for , since at node had a view number larger than , then will start all future views before .
Next, let and , i.e., the minimal view and the maximal view at respectively. To prove that the first interval of view synchronization is achieved, it suffices to show that for any constant there exists a time interval and a view such that and . Using this, we will show that there exists an infinite number of such intervals and views which will conclude the proof.
Indeed, first note that as shown above, node will start view before any other node in the system. The left-hand side of the equation is the time length in which both node and node execute together view . If the left-hand side is negative, then there does not exist an overlap, and if it is positive then an overlap exists.
For any there exists a minimum view number such that the inequality holds, and since is the minimum view number at this solution holds for any other node as well. In addition, for any the inequality also holds, meaning there are infinite number of solutions for it.
The view doubling protocol achieves synchronization validity (creftypecap 2).
The if condition in Algorithm 1 ensures that the output of the synchronizer will always be a view that a node wished to advance to. ∎
This concludes the proof that view doubling is a synchronizer for any .
Latency and communication
Since the protocol sends no messages between the nodes, it is immediate that the communication complexity is .
As for latency, the minimal satisfying Eq. 2 grows with . Since the initial view-gap is unbounded, so is the view in which synchronization is reached. The latency to synchronization is , also unbounded.
4.2 Broadcast-based synchronizer
Another leaderless approach is based on the Bracha reliable broadcast protocol [bracha1987asynchronous] and is presented in Algorithm 2. In this protocol, when a node wants to advance to the next view it multicasts a message (multicast means to send the message to all the nodes including the sender) (Algorithm 2). When at least messages are received by an honest node, it multicasts as well (Algorithm 2). A node advances to view upon receiving messages (Algorithm 2).
We prove that the broadcast-based protocol is a synchronizer.
After GST, whenever an honest node enters view at time , all other honest nodes enter view by , i.e.,
Suppose an honest node enters view at time , then it received messages, from at least honest nodes (Algorithm 2).
Since the only option for an honest node to disseminate message is by multicasting it, then by all nodes will receive at least messages. Then, any left honest nodes (at most nodes) will thus receive enough to multicast the message on their own (Algorithm 2) which will be received by by all the nodes. This ensures that all the honest nodes receive messages and enter view by .
After GST, eventually an honest node enters some new view. All honest nodes begin their local execution at view , potentially at different times. Based on the protocol eventually at least nodes (some of them might be Byzantine) send . This is because is called every . Thus, eventually all honest nodes will reach view , and from Section 4.2 the difference between their entry is at most after GST.
The above argument can be applied inductively. Suppose at time node is at view . We again know that by all other honest nodes are also at view , and once are sent all honest nodes will eventually enter view , and we are done.
The broadcast-based protocol achieves view synchronization (creftypecap 1).
From Section 4.2 an honest node will eventually advance to some new view and from Section 4.2 after all other honest nodes will join it. For any , if the honest nodes call every then it is guaranteed that all the honest nodes will execute view together for at least time, since it requires messages to move to view , i.e., at least one message is sent from an honest node.
This argument can be applied inductively, thus making an infinite number of time intervals and views which all honest nodes execute at the same time. ∎
The broadcast-based synchronizer achieves synchronization validity (creftypecap 2).
In order for an honest node to advance to view it has to receive messages (Algorithm 2). From those, at least originated from honest nodes. An honest node can send on two scenarios:
(i) was called when the node was at view (Algorithm 2) and we are done.
(ii) It received messages (Algorithm 2), meaning at least one honest node which already sent the message was at view and called and again we are done. ∎
This concludes the proof that the broadcast-based synchronizer is a view synchronizer for any .
Latency and communication
For latency, the broadcast-based synchronizer will take a constant time to reach view synchronization after GST, as we have proved, and also the same between every two consecutive occurrences of view synchronization. Thus, the latency of this protocol is .
For communication costs, the protocol requires that every node sends one message to all the other nodes, and since the latency is constant, the overall communication costs are quadratic, i.e., .
4.3 Cogsworth: Leader-based synchronizer
In this section we present Cogsworth, a new approach for view synchronization that leverages leaders to optimistically achieve linear communication. The key idea is that instead of nodes broadcasting synchronization messages all-all and incurring quadratic communication, nodes send messages to the leader of the view they wish to enter. If the leader is honest, it will relay a single broadcast containing an aggregate of all the messages it received, thus incurring only linear communication.
If a leader of a view is Byzantine, it might not help as a relay. In this case, the nodes time out and then try to enlist the leaders of subsequent views, one by by one, up to view , to help with relaying. Since at least one of those leaders is honest, one of these leaders will successfully relay the aggergate.
The full protocol is presented in LABEL:alg:relibra, and is consisted of several message types. The first two are sent from a node to a leader. They are used to signal to the leader that the node is ready to advance to the next stage in the protocol. Those messages are named and where is the view the message refers to.
The other two message types are ones that are sent from leaders to nodes. The first is called (short for “Time Certificate”) and is sent when the leader receives messages; and the second is called (short for “Quarum Certificate”) and is sent when the leader receives messages. In both cases, a leader aggregates the messages it receives using threshold signatures such that each broadcast message from the leader contains only one signature.
The general flow of the protocol is as follows: When is invoked, the node sends to , where is the view succeeding curr (LABEL:alg:relibra:wishToAdvance). Next, there are two options: (i) If forms a , it broadcast it to all nodes (LABEL:alg:relibra:leaderReceiveTC). The nodes then respond with message to the leader (LABEL:alg:relibra:TC) (ii) Otherwise, if time elapses after sending to without receiving , a node gives up and sends to the next leader, i.e., (LABEL:alg:relibra:attemptedTC). It then waits again before forwarding to . And so on, until is received.
Whenever has been received, a node sends (even if it did not send ) to . Additionally, as above, it enlists leaders one by one until is obtained. Here, the node sends leaders as well as . When a node finally receives from a leader, it enters view immediately (LABEL:alg:relibra:QC).
We start by proving that if an honest node entered a new view, and the leader of that view is honest, then all the other honest nodes will also enter that view within a bounded time.
After GST, if an honest node enters view at time , and the leader of view is honest then all the honest nodes enter view by , i.e., if then .
Let be the first honest node that entered view at time . entered view since it received from such that (LABEL:alg:relibra:QC).
If then we are done, since when sent it also sent it to all the other honest nodes (LABEL:alg:relibra:leaderMulticastQC), which will be received by , and all the honest nodes will enter view .
Next, if then the only way for to send is if it gathered messages (LABEL:alg:relibra:leaderReceiveQC), meaning at least of the messages were sent by honest nodes. An honest node will send a message only after first receiving from s.t. (LABEL:alg:relibra:TC).
Since when receiving a an honest node sends the to (LABEL:alg:relibra:forwardTCtoK), then will receive by , will forward it to all other nodes by , who will send to by and by all honest nodes will receive from and enter view .
Next, assuming an honest node entered a new view, we bound the time it takes to at least honest nodes to enter the same view. Note that this time we do not assume anything on the leader of the new view, and it might be Byzantine.
After GST, when an honest node enters view at time , at least honest nodes enter view by , i.e., after GST for every there exists a group S of honest nodes s.t. and .
Let be the first node that entered view at time . entered since it received from and (LABEL:alg:relibra:QC). If is honest then we are done, since multicasted to all honest nodes (LABEL:alg:relibra:leaderMulticastQC), and within all honest nodes will also enter view by .
Next, if is Byzantine, then it might have sent to a subset of the honest nodes, potentially only to . In order to form a , had to receive messages (LABEL:alg:relibra:leaderReceiveQC), meaning that at least honest nodes sent to . Denote as the group of those honest nodes.
Each node in sent message since it received from for (LABEL:alg:relibra:TC). Note that different nodes in might have received from a different leader, i.e., might not be the same leader for each node in .
After a node in sent it will either receive a within and enter view , or timeout after and send with to (LABEL:alg:relibra:forwardTC). They will continue to do so when not receiving for the next views after . This ensures that at least one honest leader will receive after at most . Then, this honest leader will multicast the it received (LABEL:alg:relibra:leaderReceiveTC) and at most by , all the honest nodes will receive . The honest nodes will then send to the honest leader, which will be able to create and multicast it. The will thus be received by all the honest nodes by and we are done.
Next, we show that during the execution, an honest node will enter some new view.
After GST, some honest node enters a new view.
From Section 4.3, if an honest enters some view , the time by which at least another other honest nodes also enter is bounded. Eventually, those honest nodes will timeout and will be invoked (LABEL:alg:relibra:wishToAdvance), which will cause them to send to .
If is honest, then it will send a to all the nodes (LABEL:alg:relibra:leaderReceiveTC) which will be followed by the leader sending a (LABEL:alg:relibra:leaderReceiveQC), and all honest nodes will enter view .
If is not honest then the protocol dictates that the honest nodes that wished to enter will continue to forward their message to the next leaders (up to , LABEL:alg:relibra:attemptedTC) until each of them receives . This is guaranteed since at least one of those leaders is honest.
The same process is then followed for (LABEL:alg:relibra:attemptedQC), and eventually all of those honest nodes will enter view . Cogsworth achieves eventual view synchronization (creftypecap 1).
From Section 4.3 an honest node eventually will enter a new view, and by Section 4.3 at least honest nodes will enter the same view within a bounded time. By applying Section 4.3 again and again, eventually a view with an honest leader is reached (by applying Section 4.3 recursively) and by Section 4.3 all honest nodes will enter the view within .
Thus, for any , if the Cogsworth protocol is run with it is guaranteed that all honest nodes will eventually execute the same view for .
The above arguments can be applied inductively, i.e., there exists an infinite number of such intervals and views in which view synchronization is reached. ∎
Cogsworth achieves synchronization validity (creftypecap 2).
To enter a new view a is needed, which is consisted of messages i.e., at least are from honest nodes. An honest node will send message only when it receives a message, that requires message, meaning at least one of those messages came from an honest node.
An honest node will send when the upper-layer protocol invokes while it was in view . ∎
This concludes the proof that Cogsworth is a synchronizer for any . Similar to the broadcast-based synchronizer, it allows upper-layer protocols to determine the time they spend in each view.
Latency and communication
Let be the maximum view an honest node is in at GST, and let denote the number of consecutive Byzantine leaders after . Assuming that leaders are randomly allocated to views, then. This means that in the worst case of , then .
Since when honest nodes at view want to advance to view , and if is honest, all honest nodes enter view in constant time (Section 4.3), the latency for view synchronization, in general, is . For the same reasoning, this is also the case for any two intervals between view synchronizations (see Section 3).
In the worst-case of , where is the number of actual failures during the run, then latency is linear in the view duration, i.e., . But, in the expected case of a constant number of consecutive Byzantine leaders after , the expected latency is .
For communication complexity, there is a difference between Byzantine failures and benign ones. If a Byzantine leader of a view obtains for , then it can forward the to all the leaders that follow view and those leaders will multicast the message (LABEL:alg:relibra:leaderReceiveTC), leading to communication complexity.
In the case of benign failures, communication complexity is dependent on , since the first correct leader after will get all nodes to enter his view and achieve view synchronization, and the benign leaders before it will only cause delays in terms of latency, but will not increase the overall number of messages sent. Thus, in general, the communication complexity with benign failures is . In the worst-case of communication complexity is , but in the average case it is linear, i.e., . For the same reasoning, this is also the case between any consecutive occurrences of view synchronization (see Section 3).
To sum-up, the expected latency for both benign and Byzantine failures is , and worst-case . Communication complexity for Byzantine nodes is optimistically and expected and for benign nodes is expected and worst-case .
The three presented synchronizers have tradeoffs in their latency and communication costs. Hence, a protocol designer may choose a synchronizer based on its needs and constraints. It may be possible to create combinations of the three protocols and achieve hybrid characteristics; we leave such variations for future work.
In addition, there are differences in the constraints on in these protocols, namely, the time interval between two consecutive calls to (see creftypecap 1). The view doubling synchronizer prescribes a precise , each view duration in exactly twice as its precedecessor. The other two synchronizers lower bound , in the broadcast-based by , and in Cogsworth by .
Another difference between the three protocols is that Cogsworth is the only one that uses the mapping as part of the protocol itself. While the view doubling and broadcast-based synchronizers achieve consecutive views synchronization after the first one is reached, Cogsworth ensures this only for views such that is honest. If an upper-layer protocol decides to use Cogsworth as a synchronizer, it can take advantage and use the same mapping and be certain that every view synchronization will also have an honest leader at the upper-layer protocol.
5 Related work
The seminal work of Chandra and Toueg [chandra1996weakest, chandra1996unreliable] introduces the leader election abstraction, denoted , and prove it is the weakest failure detector needed to solve consensus. By using , consensus protocols can usually be written in a more natural way. The view synchronization problem is similar to , but differs in several ways. First, it lacks any notion of leader and isolates the view synchronization component. This makes synchronizers relevant for a broad set of asynchronous solutions, e.g., the TLC framework by Ford [ford2019threshold]. Second, view synchronization adds recurrence to the problem definition. Third, it has a built-in notion of view-duration: nodes commit to spend a constant tine in a view before moving to the next. Last, this paper focuses on latency and communication costs of synchronizer implementations.
View synchronization in consensus protocols
The idea of doubling round duration to cope with partial asynchrony borrows from the DLS work [dwork1988consensus], and has been employed in PBFT [castro1999practical] and in various works based on DLS/PBFT [baudet2019librabft, buchman2018tendermint, yin2019hotstuff]. In these works, nodes double the length of each view when no progress is made. Recently, HotStuff [yin2019hotstuff] encapsulates view synchronization in a module named “Pacemaker”. Here, we provide a formal definition, concrete solutions, and performance analysis of such a module.
Latency and message communication for consensus
Dutta et al. [dutta2007overhead] look at the number of rounds it takes to reach consensus in the crash-fail model after a time defined as GSR (Global Stabilization Round) which only correct nodes enter. This work provides an upper and a lower bound for reaching consensus in this setting. Other works such as [alistarh2008solve, dutta2005fast] further discuss the latency for reaching consensus in the crash-fail model. These works focus on the latency for reaching consensus after GST. Both bounds are tangential to our performance measures, as they analyze round latency.
Dolev et al. [dolev1985bounds] showed a quadratic lower bound on communication complexity to reach Byzantine broadcast, which can be reduced to consensus. This lower bound is an intuitive baseline for work like ours, though it remains open to prove a quadratic lower bound on view synchronization per se.
Notion of time in distributed systems
Causal ordering is a notion designed to give partial ordering to events in a distributed system. The most known protocols to provide such ordering are Lamport Timestamps [lamport1978time]
and vector clocks[fidge1988timestamps]. Both works assume a non-crash setting.
Recently, Ford published preliminary work on Threshold Logical Clocks (TLC) [ford2019threshold]. In a crash-fail asynchronous setting, TLC places a barrier on view advancement, i.e., nodes advance to view only after a threshold of them reached view . A few techniques are also described on how to convert TLCs to work in the presence of Byzantine nodes. The TLC notion of a view “barrier” is orthogonal to view synchronization, though a 2-phase TLC is very similar to our reliable broadcast synchronizer.
The clock synchronization problem [lamport1985synchronizing] in a distributed system requires that the maximum difference between the local clock of the participating nodes is bounded throughout the execution, which is possible since most works assume a synchronous setting. The clock synchronization problem is well-defined and well-treaded, and there are many different algorithms to ensure this in different models, e.g., [cristian1989probabilistic, kopetz1987clock, srikanth1987optimal]. In practical distributed networks, the most prevalent protocol is NTP [mills1991internet]. Again, clock synchronization is an orthogonal notion to view synchronization, the latter guaranteeing entering and staying in a view within a bounded window, but does not place any bound on the views of different nodes at any point in time.
We formally defined the Byznatine view synchronization problem, that bridges classic works on failure detectors aimed to solve one-time consensus, and SMR which consists of multiple one-time consensus instances. We analyzed protocols which achieve view synchronization in terms of latency and communication complexity. We presented three synchronizer protocols along with their performance tradeoffs.