For a decentralized and transparent society, blockchain technology has been developed. The first blockchain system, Bitcoin [Nakamoto_bitcoin:a], suggested a peer-to-peer electronic cash system using a proof of work (PoW) consensus algorithm. In this system, nodes in a peer-to-peer network manage a distributed ledger called a blockchain in which transactions are stored. Each node writes latest transactions on a new block, which may be a part of blockchain, and then they propagate their block to other nodes. Next, each node determines whether to agree (or vote) on the received block, and the block is connected to the existing blockchain when enough votes are collected. After that, a new round starts, and the above process is repeated. This process is conducted according to a consensus protocol. Note that only transactions recorded on the blockchain are regarded as valid. Therefore, to ensure the security, it’s important for the nodes to agree on the same ledger. In fact, due to a block propagation delay and the existence of attackers, nodes can have different views on blockchain. In this case, nodes should resolve this, or the protocol should ensure that this never occurs. If not, it allows an attacker to make an invalid transaction such as a double-spending transaction, and this undermines the system security significantly. To resolve the above case, each node shouldn’t vote for conflicting blocks. Here, conflicting blocks have two types: 1) The first type is a block including conflicting transactions such as double spending transactions or invalid transactions. 2) When two or more different blocks are committed in the same round, these blocks belong to the second type. The way to resolve conflicting blocks depends on each consensus protocol.
Safety and liveness can be considered as the most important properties of a consensus algorithm. A consensus algorithm should satisfy safety, which means the consensus algorithm doesn’t commit conflicting blocks. Also, a consensus algorithm should satisfy liveness, which means the algorithm eventually extends a blockchain by adding a block. However, by FLP impossibility [fischer1982impossibility], these cannot be satisfied in asynchronous network situation at the same time. Thus, each consensus algorithm has to choose which property to sacrifice. For example, the Nakamoto consensus sacrifices the safety property (liveness over safety) while a BFT-based consensus sacrifices the liveness property (safety over liveness).
Currently, many consensus algorithms including PoW and Byzantine Fault Tolerance (BFT) based consensus algorithms exist. In this paper, we focus on BFT-based consensus algorithms, and specifically, we analyze the LFT2 [LFT2] consensus algorithm. We first model the LFT2 consensus algorithm and formalize it using a state machine. Then we prove that LFT2 satisfies safety and liveness properties in certain assumptions. Note that according to FLP impossibility, we cannot prove LFT2 satisfies both properties without any assumption. In addition, we define a metric called which can represent a liveness quality (i.e., a specific rate of creating a new block). We also compare the LFT2 consensus algorithm with two other BFT based consensus algorithms, PBFT [castro1999practical] and Hotstuff [yin2018hotstuff]. From this comparison, we find out trade-offs among these consensus algorithms. Finally, we simulate LFT2 to measure a liveness quality using our metric,
In summary, this paper makes the following contributions:
We formalize the LFT2 consensus algorithm.
We prove the LFT2 consensus algorithm satisfies liveness and safety under certain assumptions.
We simulate LFT2 to measure a liveness quality.
2.1 Practical Byzantine Fault Tolerance
Practical Byzantine Fault Tolerance (PBFT) [castro1999practical] is a BFT-based consensus algorithm. Basically, the paper [castro1999practical] proposed it to prevent byzantine failures in a replication system. To put it simply, PBFT is a practical algorithm for consensus of a distribution system where at most byzantine nodes out of nodes can exist in an asynchronous system. PBFT is represented in Figure 1.
Since PBFT is designed for a distributed replication system, it needs request and reply phases. However, because both phases aren’t needed for consensus, PBFT would have only pre-prepare, prepare, and commit phases for consensus in a blockchain. Each round has a designated leader, and a leader proposes a new block and sends it to all other nodes at the start of round, where the new block would be what nodes should vote for. When the leader proposes a new block, pre-prepare phase starts. After sending this process, prepare phase starts. In prepare phase, all nodes receive the block and check it. If the block is valid, then each node has to broadcast Prepare message to other nodes. While broadcasting the message, a node can receive Prepare message from other nodes. If a node receives Prepare message, the node will be in prepared state. In commit phase, if a node is prepared state, each node broadcasts Commit message. After broadcasting, each node will receive Commit message, and if the number of Commit messages gotten from different nodes is greater or equal to , then the node accepts the block proposed by the leader. In this case, we state that the block is committed.
In addition, there is a view-change step in PBFT when a leader fails to send a new block to other nodes. If view-change occurs, the next leader sends another new block to other nodes. Because this step is a bit complicated, we don’t describe this here. For more details, please refer the paper [castro1999practical].
Hotstuff is a leader-based BFT replication protocol published in 2018 by M.Yin et al [yin2018hotstuff]. A big difference between this protocol and PBFT is leader dependency. In PBFT, a leader is crucial only in pre-prepare phase, but in Hotstuff, the leader is crucial in every phase. This is because, in Hotstuff, each node sends a message to only a leader, and the leader should propagate the message to other nodes in every phase. This is a major difference between PBFT and Hotstuff, and this is why we refer Hotstuff to as a leader-based BFT protocol. Since each node doesn’t directly broadcast a vote message to other nodes, the leader node can make some malicious actions. To prevent this, Hotstuff uses quorum certificate (QC), which is needed for proving that the leader receives correct vote messages.
The basic algorithm of Hotstuff is represented in Figure 2. Similar to PBFT, Hotstuff can successfully make nodes achieve consensus when at most number of faulty nodes exist in the system where the total number of nodes is . In prepare phase, each node sends a new-view message to a leader, and the leader receives new-view messages. Then the leader sends prepare message to all nodes in the system, and each node receives the message. In pre-commit phase, each node checks the received prepare message, and if the message is valid, then it sends the prepare vote message to the leader. If the leader receives prepare vote messages, then the leader sends pre-commit message to each node. In commit phase, each node validates the pre-commit message and if the message is proper, then it sends pre-commit vote message to the leader. If the leader receives pre-commit vote messages, then the leader sends commit message to each node. In decide phase, each node checks the commit message gotten from the leader, and if the message is proper, then a node sends commit vote message to the leader. Similar to the previous phases, if the leader receives commit vote messages, the leader makes a decide message and sends to each node. In all the above steps, a leader can be a byzantine node for some reasons. Each node can send a new-view message to the next leader when the node judges the current leader node is byzantine. If the next leader receives new-view messages, then the new leader starts consensus by repeating the above process. In fact, this new-view message process is similar to a view-change process in PBFT.
As shown in Figure 2, we can see that each round is symmetric. Thus, considering this characteristic, the Hotstuff whitepaper [yin2018hotstuff] suggests an advanced Hotstuff called Chained Hotstuff, which has higher scalability. Chained Hotstuff can be simply explained as a pipelining version of Hotstuff. This is described in Figure 3. In each round, a leader node makes a message linked with the previous block, and sends it to other nodes. Each node receives the message from the leader and then verifies the message. The received messages include four parts, and each node has to check the whole four parts of the messages. In summary, Chained Hotstuff uses pipelining in each round, and as a result, this obtains scalability without losing the security since consensus steps aren’t reduced.
The basic algorithm of LFT2 is similar to PBFT. A leader makes a new block and broadcasts it to other nodes. Each node knows a leader of each round. Thus, each node can check that the received block is proposed by a proper (or valid) leader. If the received block doesn’t have any error, then each node makes and broadcasts a vote message. If a node gets enough vote messages, then the block would be a candidate block and the previous candidate block becomes a committed block. Figure 4 represents LFT2.
To explain a specific algorithm of LFT, assumptions and rules of LFT2 are required. In a LFT2 consensus algorithm, there are two types of nodes, honest nodes and byzantine nodes. These nodes do following actions.
If a node receives a message, it sends the message to neighbor nodes at a specific time (gossip communication).
A leader selection algorithm exists and each node knows the order.
Each node has a local timer.
Each node already knows a cipher suite, and every message includes digital signature.
A byzantine node can delay or may not send a message.
A byzantine node can send different messages to different nodes.
A byzantine node cannot generate other node’s digital signature.
In LFT2, there exist two steps, propose and vote. In propose step, a leader proposes a new block and broadcasts to other nodes. In vote step, each node checks the received block and sends a vote message to other nodes. The basic rule of LFT2 is as follows.
If propose step starts, ProposeTimer works.
If ProposeTimeout occurs, a failure vote progresses.
In vote step, if enough vote messages arrive but consensus isn’t completed, Votetimer works.
If consensus completes within a fixed time, VoteTimer stops.
If consensus doesn’t complete within a fixed time, VoteTimeout occurs.
A leader makes only one block in one round.
A validator receives a new block from the leader.
Validators check the block information (if it’s proposed by a proper leader, has the correct previous hash, and is connected with the candidate block, etc).
If the block is valid, validators send a vote message to other nodes.
If a node receives enough votes for a block of a higher round or a block with a higher height than the candidate block that the node views, the node changes this block to a candidate block.
A node commits the previous candidate block when replacing the current candidate block with a new candidate block.
We define states and transitions for each node based on above rules in Section 3.
3 System Formalization
In this section, we formalize the LFT2 protocol. This formalization would help to understand the LFT2 consensus protocol and prove safety and liveness. Before the formalization, we first define variables. For the network formalization, represents a message, which transmits from a node to other nodes, and indicates the number of nodes in the system. For the node formalization, we define two variables: and . The parameter indicates a message set including valid new information, and indicates a message set, which a node has to send to other nodes. The parameter indicates a block height, and are a set of collected transactions and a set of votes for a candidate block, respectively. Our model follows almost the paper [castro1999correctness].
3.2 The Multicast Channel Automaton
In this section, we model a network state and transitions of a blockchain system. The variable represents a state of network, and
is a vector that represents whether each node received or not received a message. SENDmeans sending message to node set , so state should include after SEND RECEIVE means that node receives message so we should update state as follows. Because node received the message according to RECEIVE we remove from and add to . MISBEHAVE is similar with a RECEIVE process. The difference is because of misbehavior, the receiving node set will be instead of
3.3 The Replica Automaton
These auxiliary functions use for representing below transitions. Each function has own input and output values. For example, function has block height an an input and outputs the proposer node ID for This function can use to check if the proposer (leader) is valid at block height . The function has block information as an input and outputs boolean. This function checks the correctness of hash value of block and returns if the value is true or false. Lastly, means a maximal set of same votes in . This is used for checking whether enough votes are collected.
3.4 The Replica Automaton
The above transitions represent state transitions when there exist new inputs or outputs. Input transitions occur when a message is received from other nodes, and output transitions occur when a message should be sent to other nodes. In Algorithms 4, 5, and 6, Pre indicates conditions required for the corresponding transition, and Eff indicates the result of the transitions. For example, RECEIVE(NEW-BLOCK means that node receives message NEW-BLOCK from node and it should satisfy conditions in Pre (see Algorithm 4). The result of the transition is represented in Eff. RECEIVE(NEW-BLOCK occurs when a NEW-BLOCK message is arrived. RECEIVE(VOTE occurs when a VOTE message is arrived, and RECEIVE(TIMEOUT occurs when a TIMEOUT message is arrived. As another example, we describe SEND represented in Algorithm 5. This means that node sends message to other nodes, so should be in . After occurs, should be removed from .
Algorithm 6 represents state transitions that occur internally in a node. SEND_BLOCK() occurs when node is a proposer so the node has to make a new block and send it to other nodes. SEND_TIMEOUT occurs when a node has to receive NEW-BLOCK message but no NEW-BLOCK message arrived until occurs. COMMIT occurs when enough VOTE messages are received, where votes are for committing a new block. VOTE_FAIL occurs when occurs, which means enough VOTE messages aren’t received (i.e., the number of elements in does not satisfy ). In this case, consensus is failed. Lastly, VOTE_TIMEOUT occurs when enough TIMEOUT messages arrive, which means the leader node doesn’t send a NEW-BLOCK message.
3.5 Consensus Process
Next, we formalize the LFT2 consensus process by using transitions. First, a leader should send a new block to other nodes. Other nodes wait for a new block but if it doesn’t arrive until occurs, then send TIMEOUT message and vote about the message. On the other hand, a new block arrives in time, then the node sends VOTE message to other nodes. In vote phase, if occurs, then VOTE_TIMEOUT will execute. If occurs, then VOTE_FAIL will execute, and if both and don’t occur and enough votes for the new block message arrive, then COMMIT will execute. In this case, the round successfully ends.
4 Safety and Liveness
In this chapter, we first define safety and liveness and prove that LFT2 satisfies safety and liveness under certain conditions. In addition, to measure liveness quality of LFT2, we suggest a metric, which represents an average rate of generating committed blocks.
Before proving that the LFT2 satisfies the safety property, we define safety below.
(Safety) We state that a blockchain system satisfies safety when a conflicting block never commits.
The intuitive meaning of satisfying a safety property is conflicting blocks will never commit. In other words, safety asserts that nothing bad thing happens, where bad things mean that conflicting blocks are committed. Conflicting blocks include two types: The first type is a block including conflicting transactions such as double-spending transactions or invalid transactions. The second type indicates blocks when they are committed at the same height. In Lemmas 1 and 2, we prove that the second and first types of conflicting blocks will not commit in LFT2, respectively.
In LFT2, two different blocks cannot be committed at the same height when at most byzantine nodes exist in a system where there are nodes.
Let’s assume two different blocks and committed at height . This means that there are at least nodes who committed block and another at least nodes who committed block . Therefore, this means at least votes provided by nodes exist. Because at most nodes can vote two times simultaneously, at most votes can exist, which implies that two different blocks cannot be committed at the same height.
In LFT2, two conflict transactions cannot be committed when at most byzantine nodes exist in a system where there are nodes.
By Lemma 1, in LFT, a fork (i.e., committing different blocks at the same height) cannot be happen. Without a fork, to commit conflict transaction, there should be at least malicious nodes. However, because at most byzantine nodes can exist, conflict transactions cannot be committed.
From both lemmas, we show that LFT2 satisfies the safety property.
LFT2 satisfies a safety property.
Next, we analyze liveness of LFT2. We first define liveness below.
(Liveness) We state that a blockchain system satisfies liveness when a new block is committed without stuck.
Intuitively, satisfying a liveness property is that a new block will be committed continuously without any stuck. In other words, liveness asserts that something a good thing eventually happens, where a good thing implies committing a block.
Now, using some assumptions, we prove liveness of LFT2. Note that we cannot prove liveness without any network assumptions according to FLP impossibility [fischer1982impossibility].
We assumes that every non-byzantine node is in at some specific time, where implies that node is ready to enter a new round. We also define as a time difference between when the first node gets the next and when the last node gets the next Then if both and are satisfied, consensus would complete. Furthermore, if the leader is a non-byzantine node, a new block will be committed eventually.
Assume that both nodes and are non-byzantine, and nodes and are the first and last nodes gotten state respectively. That is, node first got and node lastly got Let be the time when node ’s state became , i.e. the time when node ’s state became would be according to the definition of At time , a proposer timer of node works until . If node didn’t receive the NEW-BLOCK message until , then the node will send TIME-OUT message. To prove the theorem, let’s consider the worst case. The worst case is when the new round leader is node , which means that a new round leader is the latest node whose state became . To complete consensus, NEW-BLOCK message should arrive at node until . However, because we assumed that node is the leader and the time when is , the maximum time when node receives NEW-BLOCK will be . As a result, to complete the consensus,
should be met.
After sending a NEW-BLOCK message, if a vote timer of node works until . If node doesn’t receive enough VOTE messages until , then node would think that consensus is failed and it’d go to next step. Thus, to complete the consensus, enough VOTE messages should arrive at node until . The worst case in this situation is that the VOTE message sender receives NEW-BLOCK message at and then sends the VOTE message but the VOTE message arrives at to . As a result, to complete the consensus,
should be satisfied.
Obviously, if there is no and a leader is malicious, then the leader may not send a message, so replicas never get NEW-BLOCK message. This means that consensus will be never completed. Similarly, if there is no and a leader is malicious, then the leader can generate a conflicting NEW-BLOCK message, so replicas never receive enough VOTE messages. This also makes consensus never complete. Thus,
should be satisfied. Finally, if the following equation
is satisfied, then the consensus is completed.
After the consensus is completed, we assume that a new block is not committed, which means that timeout occurs at some point. Also, we assume that occurs. Since the number of byzantine nodes is less or equal to and is satisfied, the only case that can occur is when the leader is byzantine, which is a contradiction.
Next, let’s assume occurs. Similarly, since the number of byzantine nodes is less or equal to and is satisfied, the only case that can occur is when the leader is malicious. The occurrence means that not enough VOTE messages arrive but no more than number of byzantine nodes exist. This means that conflicting NEW-BLOCK messages are propagated to each node. The only node who can make conflicting NEW-BLOCK messages is the leader. This is because if one of other nodes makes conflicting NEW-BLOCK messages, the message would be filtered by each node using function. In conclusion, both timeout cases can occur only when the leader is byzantine, but this is a contradiction. Therefore, if the leader is a non-byzantine node, a new block would be committed.
Moreover, we define to measure how good LFT2 has liveness.
Let be the total number of faulty nodes in the LFT2 system, we define (i.e., ) as follows:
The term of indicates the number of committed blocks during changing a leader -times if the system has faulty nodes.
Then we prove the minimum average rate of generating committed blocks is in LFT2.
If , and are satisfied at each round, then . Here, indicates a time difference between when the first node gets the next and when the last node gets the next .
LFT2 uses a round-robin type of a leader selection algorithm. Therefore, each node would be a leader once during changing a leader times. By Theorem 4.2, if a leader is a non-byzantine node, a new block would be committed by consensus. Thus,
is satisfied. By the division theorem, we can represent as follows:
Using both results, we can represent as follows:
Therefore, we can represent as below:
Thus, as follow:
This completes the proof.
In LFT2, if and are satisfied at each round, then the following inequality is satisfied.
Here, indicates a time difference between when the first node gets the next and when the last node gets the next . Parameter represents the number of committed blocks during changing a leader times when the system has at most faulty nodes.
Let be the number of rounds in which faulty nodes exist during changing a leader times. Obviously, Here, note that when a leader changes, one round passes. Then we can represent as follow:
The term means the number of committed blocks when the system has faulty nodes during changing a leader -times. Using the linearity of expectation, the following equation is satisfied:
By Lemma 3,
is satisfied. Therefore, the following inequality is hold:
This completes the proof.
5 Comparative Analysis
In this section, we compare LFT2 with other two consensus protocols. From this comparison, we study trade-offs among the three consensus algorithms.
5.1 PBFT vs LFT2 vs Chained Hotstuff
Scalability. LFT2 [lft2_white] is between PBFT [castro1999practical] and Hotstuff [yin2018hotstuff] logically. When broadcasting a vote message, LFT2 is similar to PBFT. On the other hand, because it doesn’t commit a new block in one round, we can say that LFT2 is similar to Chained Hotstuff. Without any byzantine nodes, the average number of phases required for committing a block is one, two, and three for Chained Hotstuff, LFT2, and PBFT, respectively. All three consensus algorithms need three steps during one block committing process, but the average number of phases required to commit a block depends on how they apply pipelining technique. For this reason, in terms of scalability, Chained Hotstuff has the best performance while PBFT is the worst.
Network Bandwidth. PBFT has three phases in one round; a leader first broadcasts a message to other nodes, the nodes vote the message received from the leader in the second phase, and the nodes send the pre-commit message based on the received votes in the last phase. Using big-O notation, the first, second, and third phases would have the network bandwidth complexity of and , respectively. Here, means the number of nodes in the consensus system. LFT2 has two phases where a leader first broadcasts a message to other nodes and then the nodes vote the message that the leader sent in the second phase. Using big-O notation, the first and second phases have the network bandwidth complexity of and , respectively. Chained Hotstuff has only one phase where a leader first broadcasts a message to other nodes and then, similar to a voting process, each node responses to the leader. Using big-O notation, each phase would have the network bandwidth complexity of . Considering the above, we find out that Chained Hotstuff has the best network bandwidth complexity, compared to PBFT and LFT2. In addition, even though PBFT and LFT2 have the same network bandwidth complexity in terms of Big-O notation, we can state that LFT2 has better network bandwidth complexity than PBFT because LFT2 and PBFT have two phases and three phases on average, respectively.
Decentralization. From the decentralization point of view, PBFT is the best because only one of the three phases is involved with a leader. In PBFT, only pre-prepare phase is a leader-dependent phase, but the other phases, prepare and commit phases, are leader-independent phases. Therefore, a proportion of leader-involved phases is 33.3%, where a proportion of leader-involved phases indicates the ratio of the number of leader-involved phases to the number of phases in one round. In Chained Hotstuff, however, every phase is involved with a leader. Every phase of Chained Hotstuff is symmetric, and a leader collects the vote messages, so a proportion of leader-involved phases is 100%. We can also consider that LFT2 is positioned between PBFT and Chained Hotstuff. In LFT2, the propose phase is the leader-dependent phase but vote phase is the leader-independent phase, which implies that a proportion of leader-involved phases is 50%. Considering the above, we find out that PBFT has the best decentralization, LFT2 is the second, and Chained Hotstuff is the worst decentralization level.
|Worst (3)||Second (2)||Best (1)|
|Network Bandwidth (Big-O notation)||Worst ()||Second ()||Best ()|
|Best (33.3%)||Second (50%)||Worst (100%)|
In this chapter, we simulate the LFT2 consensus algorithm using LFT2 implementation [LFT2]. Our simulation measures by varying two timeouts: and In our simulation, 1) we investigated the relationship between timeout and by varying the number of nodes, and 2) the relationship between timeout and by varying the number of failure nodes. We set and to the same value, as ICON LOOP [ICON] set them, and simulate the timeout range from 0 to 4 seconds in 0.1 second increments.
The LFT2 implementation provides a simulation tool with a system console, which can control the simulation environments. In the original implementation code, the network delay was set to a random value between 0 and 1 second, and the and was set to 2 seconds. However, in the real network, delay doesn’t follow the random function, so we had to change the network delay model to a similar one to real. Because network delay changes dynamically for various reasons, we choose to collect real data rather than modeling network delay theoretically. We collected a data for network delay in a website [Bitcoin_Monitoring]
, which provides some network information about Bitcoin. We use the ‘current block propagation delay distribution’ of 3 February, 2020. The delay distribution represents the elapsed time between the first reception of an INV message announcing a new block and the subsequent reception from other peers in the Bitcoin network. The delay information contains some delay data that is over 4 seconds, but the most of data is less than 4 seconds. Thus, we ignore the data which is over 4 seconds. The probability distribution of the Bitcoin network delay is shown in Figure5.
Using the simulation tool provided by ICON LOOP [LFT2], we can measure under various settings. We do an experiment by changing two variables: the number of nodes in the LFT2 network and the number of failure nodes In both cases, we measure the value of with varying a timeout from 0 to 4 seconds in 0.1 second increments. Specifically, in the first, we measure under four different settings of the total number of nodes, 4, 10, 50, and 100. In the second case, we measure varying the number of failed nodes when total number of noes in LFT2 is 21. We calculated by running the LFT2 simulator until more than 200 blocks are created.
The number of nodes. Values of with varying the number of nodes is represented in Figure 6. In this figure, we can see that values of are similar even when the number of nodes is different, which implies that is rarely influenced by the number of nodes. Also, the value of rises sharply between 0.2 and 0.6 seconds because most of network delay is concentrated in 0.1 to 0.3 seconds (refer to Figure 5). This supports that our timeout condition in the liveness proof is reasonable when we set the value of timeout to 2 seconds. Finally, from Figure 6, we observe that converges to 100% at timeout of greater than 2 seconds, which means that without any failure node, LFT2 commits all blocks properly.
Failure. We also simulate LFT2 by varying the extent of failure, where the number of all nodes is set to 21. Figure 7 represents the value of when the number of failed nodes is from zero to six out of 21 and timeout is from 0 to 4 seconds. As shown in the figure, one can see converges to as timeout increases, where indicates a fraction of failure nodes to all nodes. This conforms with Lemma 3. When the value of is less than or equal to converges within an error range of about 2% at timeout of s. Meanwhile, when five and six nodes are failed, timeout is required to be greater than 3.9 s and 5.3 s, respectively, for to converge. Table 2 represents timeout when the value of reaches to the value of convergence for the first time, under various settings of . In Table 2, one can see that the greater the value of is, the greater the timeout when reaches to the value of convergence for the first time.
Many blockchain consensus algorithms have been developed, and one of most important properties of the consensus algorithm is safety and liveness. In this paper, we analyze LFT2 used in ICON. To do this, we formalize the protocol and analyze safety and liveness of LFT2. It requires a certain network assumption to prove liveness. Therefore, to decide if this assumption is reasonable, we simulate LFT2 and measure liveness quality by using our metric, This shows that when we set timeout to a sufficiently large value (about 4 seconds), a high level of liveness can be guaranteed.