Window Based BFT Blockchain Consensus

06/11/2019 ∙ by Mohammad M. Jalalzai, et al. ∙ Louisiana State University 0

There is surge of interest to the blockchain technology not only in the scientific community but in the business community as well. Proof of Work (PoW) and Byzantine Fault Tolerant (BFT) are the two main classes of consensus protocols that are used in the blockchain consensus layer. PoW is highly scalable but very slow with about 7 (transactions/second) performance. BFT based protocols are highly efficient but their scalability are limited to only tens of nodes. One of the main reasons for the BFT limitation is the quadratic O(n^2) communication complexity of BFT based protocols for n nodes that requires n × n broadcasting. In this paper, we present the Musch protocol which is BFT based and provides communication complexity O(f n + n) for f failures and n nodes, where f < n/3, without compromising the latency. Hence, the performance adjusts to f such that for constant f the communication complexity is linear. Musch achieves this by introducing the notion of exponentially increasing windows of nodes to which complains are reported, instead of broadcasting to all the nodes. To our knowledge, this is the first BFT-based blockchain protocol which efficiently addresses simultaneously the issues of communication complexity and latency under the presence of failures.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Consensus is used to agree on a new block to be appended to the chain by the nodes in the network. A blockchain is compromised of two main components:a cryptographic engine and a consensus engine. The main performance and scalability bottleneck of a blockchain also lies in these components. Here we only focus on improving consensus component of blockchains. As already mentioned, PoW-based protocols are highly scalable. In Bitcoin [1], which is one of the most successful implementation of blockchain technology, typically the number of nodes (replicas) are usually large in the range of thousands [1, 2].

PoW involves the calculation of a number based on the hash value of a block adjusted by a difficulty level. Solving this cryptographic puzzle by nodes (miners) limits the rate of the block generation as solving the puzzle is CPU intensive. Bitcoin uses PoW but the number of transactions per second can reach up to just 7 transactions per second [2]. The block generation rate is approximately 10 minutes [1]. Additionally, the power utilized by Bitcoin mining in 2014 was between 0.1-10 GW and was comparable to Ireland’s electricity consumption at that time [3]. Different solutions were proposed, for example, Ethereum [4] uses faster PoW, BitcoinNG [5] uses two types of blocks, namely, key blocks and micro-blocks, and has achieved more throughput in comparison with Bitcoin. But all these solutions fall well short of matching the throughput offered by leading credit-card companies ( on average and maximum transactions per second).

On other the hand, BFT-based [6]

protocols guarantee consensus in the presence of malicious (Byzantine) nodes, which can fail in arbitrary ways including crashes, software bugs and even coordinated malicious attacks. Typically, BFT-based algorithms execute in epochs, where in each epoch the correct (non-malicious) nodes achieve agreement for a set of proposed transactions. In each epoch there is a

primary node that helps to reach agreement. The consensus is achieved during each epoch and an entry or a set of entries are added to the log. In case the primary is found to be Byzantine,a view change (select new primary) takes effect to provide liveness. These protocols have shown the ability to achieve throughput of tens of thousand transactions per second [7, 8]. However, their scalability has been tested with a very small number of nodes , usually 10 to 20 nodes, due to the requirement for broadcast [2], that is, they have quadratic communication complexity.

To address the scalability issues in BFT protocols, we introduce the Musch blockchain protocol. Musch is BFT-based and achieves communication complexity in an epoch, where is the actual number of Byzantine nodes (). For small (i.e. constant) the communication complexity is linear, and hence, Musch has scalable performance. Musch does not need to know the actual value of since it automatically adjusts to the actual number of nodes that exhibit faulty behavior in each epoch. At the same time, the latency is comparable with other efficient BFT-based protocols [7, 9].

The performance of our algorithm is based on a novel mechanism of communication with a set of window nodes. Nodes reach sliding windows moving over node IDs to recover from faults during consensus. If a replica does not receive expected messages from the primary, it complains to the window nodes from which it recovers updates (see Fig. 1). Initially, the window consists of only one node. If the complainer replica doesn’t receive a valid response from any of the the window nodes, it considers the next window of double size to which it sends the complaint. The last window size is no more than which guarantees to have a correct node within the last window. This gives communication complexity. In this way, Musch avoids broadcasts while guaranteeing consistency.

PBFT FastBFT Aliph Musch
Total Replicas
Critical Path 4 4
TABLE I: Characteristics of state of Art BFT protocols. The actual faulty nodes is , while is an upper bound, .

Table I compares Musch with other state of art BFT-based protocols such as PBFT [9], FastBFT [10], and Aliph [8]. We compared the communication complexity measured as number of total exchanged messages during an epoch. Our algorithm’s performance depends solely on , while the other algorithms have quadratic communication complexity. Hence, our algorithm has an advantage when is asymptotically smaller than , resulting in less than quadratic communication complexity. When is a constant our algorithm is optimal. Additionally, we also compared the critical path length, as the number of one-way message latency it takes for a client request to be processed and the response is received by the client. Note that the total number of nodes is , where is a conservative upper bound on the number of faulty nodes. The actual faults are bounded by . Our algorithm does not need to know .

SBFT [11] has also tried to address the issue of scalability and have tested their protocol with replicas, while achieving better performance than Ethereum. In normal mode when , SBFT’s message complexity will be ( is the number of collectors) as compared to Musch’s . The actual value of Byzantine nodes () has to be known (to choose correct such that ) for the system to avoid the fall-back protocol. But in practice it is impossible to know the actual value of (avoiding fallback is not possible for small ). Fall-back mode executes efficient PBFT with complexity. This complexity causes additional latency and performance degradation in SBFT.

Paper Outline

We continue with this paper as follows. In Section II, we give the model of the distributed system. In Section III, we present our algorithm. Protocol checkpoints are presented in Section IV. We give the correctness analysis in Section V, and the communication complexity bound analysis in Section VI.

Ii System Model

Like other BFT-based state machine replication protocols Musch also assumes an adversarial failure model. Under this model, servers and even clients may deviate from their normal behavior in arbitrary ways, which includes hardware failures, software bugs, or malicious intent. Our protocol can tolerate up to number of Byzantine replicas where the total number of replicas in the network . Replica ID is an integer from the replica set that identifies each replica. The actual number of Byzantine replicas in the network is denoted by

, and at any moment during execution

. If then the execution is fault-free. However, may not be known. Our algorithm’s communication complexity adapts to any value of .

Fig. 1: Windows of nodes

Iii Protocol

Our proposed protocol uses echo broadcast [12], where the primary proposes a block of transactions, and replicas respond by sending back signed hashes of the block. We assume strong adversarial coordinated attacks by various malicious replicas. However, replicas will not be able to break collision resistant hashes, encryption, or signatures. We assume that all messages sent by replicas and the primary are signed. For example if primary proposes a block of transactions to the replica , we assume that it has been signed by primary . Any unsigned message will be discarded. To avoid repetition of message and signatures, Musch also uses signature aggregation [13] to use a single collective signature instead of appending all replica signatures, to keep signature size constant. As the primary receives message with their respective signatures from each replica , the primary then uses these received signatures to generate an aggregated signature . The aggregated signature can be verified by replicas given the messages where , the aggregated signature , and public keys . Like other BFT-based protocols [9, 14, 15] each replica knows the public keys of other replicas in the network. In Section III-B we explain how to use the IDs to define the windows that we use in the algorithm.

It is not possible to ensure the safety and liveness of consensus algorithms in asynchronous systems where even a single replica can crash fail [16]. Musch’s safety holds in asynchronous environments. But to circumvent this impossibility for liveness, Musch assumes partial synchrony [17]. This partial synchrony is achieved by using arbitrarily large unknown but fixed worst case global stabilization delays.

During normal operation, Musch guarantees that at least replicas in each epoch are consistent (out of the ). Let be the maximum round-trip message delay in the network. In our algorithm, at any moment of time, the suffix of the execution histories between any two replicas differ by at most the maximum number of blocks that can be committed during a time period of . Thus, any inconsistency is limited to only a small period of time.

Musch executes in epochs. An epoch is a slot of time in which replicas receive block proposed by the primary and agree to commit it. Thus, during each epoch a block is generated and added to the chain. Since is responsible for aggregating replica signatures for block agreement, if less than replica signatures are collected then a view change will be triggered and the primary will be changed. It should be noted that is also responsible for collecting transactions from clients, ordering the transactions, and sending them to the replicas.

Iii-a Normal Operation

As shown in Algorithm 1, the primary collects a set of transactions from the clients into an ordered list of transactions (which it will propose in a candidate block) with a sequence number , view number , hash , and hash history into candidate block . Primary then proposes (broadcasts) the candidate block to each replica . As shown in Algorithm 2, upon receipt of each replica validates the information, and then replica responds to the primary with the willingness to accept the block in a message to the primary.

The primary collects at least responses from the replicas, aggregates them to , and generates a compressed aggregated signature [13]. Then, the primary broadcasts . Upon receipt, each replica verifies signatures and the candidate block commits. If verified successfully each replica responds to the client with the reply message , where is the client, is the timestamp and is the result of execution. Upon receipt of valid messages (which might take messages to receive) a client accepts the result. Assuming a continuous creation of blocks, the primary starts the new epoch immediately after the old epoch finishes. Let be the maximum delivery delay of a message in the network. According to the protocol, in the epoch of the new block with sequence number there will be two messages that replica expects to receive from the primary: (i) the type message for block with sequence within time from the end of the previous epoch, and then (ii) the type message for block within time since the receipt of the message. Therefore, the maximum time for an epoch for a replica is . A replica goes into recovery mode at time if either of the two expected messages is not received.

1 Latest committed block sequence number is upon receipt of transactions from a set of clients  do
2       Create a block with sequence number Broadcast to replicas upon receipt of hashes of from replicas do
3             Aggregate the hashes into Commit (,) Broadcast to replicas Send to client set
4       end
6 end
Algorithm 1 Primary
// Normal Execution
1 Latest committed block sequence number is upon receipt of block from primary with sequence number  do
2       Calculate hash of block Send to primary upon receipt of aggregated hash for block from primary do
3             if  is signed by at least replicas then
4                  Commit Send to each client
5             end if
7       end
9 end
// Special Cases
10 check always at any time that
11       if no receipt of expected block or respective hash within a timeout period then
12             Execute Algorithm 3 with parameter
13       end if
14      if receipt of a block with sequence  then
15             Execute Algorithm 3 with parameter
16       end if
17      if receipt of valid set of complains with complainers then
             Execute Algorithm 5 // initiate view change
19       end if
21 end
Algorithm 2 Replica

Iii-B Recovery Mode

In BFT protocols, when a replica detects an error it broadcasts complaints to all replicas in the network. In contrast to this, a replica in Musch during a failure event will only complain to a subset of replicas in the network called window nodes. If did not receive a response from the current window then the replica complains to the next window of double size until it receive response from at least one correct replica. The window sequences are fixed, , where . Suppose the replica IDs are taken from the set and sorted in ascending order (see Fig. 1). The window consists of a single node with the smallest ID, Window consists of two replicas with the next IDs in order, Window consists of four replicas with the next higher IDs, and so on. Therefore the window consists of replicas, whose IDs are ranked between . During the execution of the algorithm, the maximum window that will be contacted is actually , where , since this guarantees that at least one correct node will be encountered among all the window nodes from up to .

Parameters: from
Let be the block sequence number in for which has not received either or // window index
// current window is
1 if  then
       // all window nodes prior to are faulty
2       Broadcast message to replicas
       // is in a later window than or is not a window node at all
4       Send to all nodes in window if there is no commit by a certain timeout then
             // increase window
5             Goto Line 3
6       end if
8 end if
// listen for responses
9 Let be the expected sequence number of blocks in the time period since issued upon receipt of blocks and respective hashes up to at least  do
10       Commit all received pairs of block and hash
11 end
Algorithm 3 Fault Recovery in Replica
1 upon receipt of or message from replica  do
2       if  from is valid then
3             Add by distinct complainer to the set of complains if distinct number of complainers in is at least  then
4                   Broadcast Execute Algorithm 5 Reset to empty
5            else
6                  Let be the sequence number of block requested in if  has the th block and its hash then
7                         Send all blocks and respective hashes starting from sequence up to the latest to replica
8                   end if
10             end if
12       end if
13      else if  is valid then
14             Broadcast to replicas Execute Algorithm 5 Reset to empty
15       end if
17 end
Algorithm 4 Window Node

Algorithm 3 describes how a replica complains to the window(s), and Algorithm 4 shows the respective reactions from the window nodes. As shown in Algorithm 3, if replica complains that it didn’t receive expected message ( or ) from during normal operation, it sends the complaint in the form of , where and belongs to the last committed block in the chain of . If it complains to a window , this message is sent to all nodes in which will then know that replica does not have or messages after block . If replica has received a message from the primary that proves the maliciousness of , then it attaches the proof in its complaint to .

When enters the recovery mode it first complains to window , which has a single node. If doesn’t get any useful response from then it complains to , which has two nodes, so it informs both nodes. This process can repeat until contacts all nodes in , the last window. It is guaranteed that replica will get a response from a correct node in one of these windows. As shown in Algorithm 4, the window nodes respond to complaints by returning the requested information. If they do not have it then they call themselves Algorithm 3 as well. If the complainer is a window node itself, it will stop until it reaches its own window size and will broadcast the complaint. Upon broadcast it is guaranteed that it will receive response. The response can be either receipt of missing messages or a view change. If replica received the missing messages it will forward it to the complainers that it knows, else it will result in view change (primary will be replaced).

Note that regular replicas and window nodes may be complaining at the same time and probably for the same reason. A regular node will have to wait for the window nodes to first obtain a response. It is important to coordinate the actions of the windows nodes and the regular replicas to receive the responses efficiently without message replication. For a regular replica

, the timeout period for waiting a response from the window is at most . As it takes to detect timeout for the current epoch, then it takes at most to receive a message from the previous window and send the message back to the replica. In case window does not receive a message from window , it will broadcast its complaint and it is guaranteed that it will receive a response, which it will send back to replica (). From the start time of the current epoch, if does not get a response within then it will contact the next window .

Iii-C View Change

A view change can be triggered if a correct window node receives at least distinct replica complaints (against primary ) as shown in Algorithm 4. This guarantees that at least one of the complaints is coming from a correct replica.

Another reason for view change can be the receipt of an explicit against by window node . Once view change is triggered, window node broadcasts the set of or messages it has received to all replicas (Algorithm 4).

Without loss of generality, consider the case where window node has sent to all replicas (the same mechanism also applies to other sets of messages). Upon receipt of a replica increments its view number () and assigns new primary (namely, ) (Algorithm 5).

Replica then adds its most recent block hash and block number in the message along with in a message and sends it to the new primary (Algorithm 5).

Upon receipt of at least view change messages from different replicas, stores them into set . Then, broadcasts , where is an aggregated signature for all replicas involved in (Algorithm 6). Upon receipt of this message, each replica recovers the latest block history. Assume is the highest block number committed so far in the chain. The block must have been committed by at least replicas, and since has size at least , it must be that replicas in have also committed , one of which is a correct node. Thus, every replica upon receipt of can figure out that the latest committed valid block number is .

Once is known, a replica will check if block with sequence is the latest block in its history , and if it is, sends a confirmation message to (Algorithm 5).

In this case, at least correct replicas know the latest block of (). If is same as then begins updating all other replicas that have fallen behind (Algorithm  6). will not send any block generated earlier than the water mark (Section IV). If does not have as its latest block then at least correct replicas know about it and they send missing blocks and their respective messages to and then updates other replicas as described above. Once has updated other replicas it will wait to receive at least correct replicas have sent confirmation (Algorithm 6).

Since there are at least correct replicas, signs the latest block in their histories that has received using an aggregated signature and broadcasts it to the replicas. Upon receipt of each replica is now ready for the new epoch of the next block and is waiting to receive an message from the new primary (Algorithm 5). In case a replica does not receive expected messages (, or blocks and their hashes within a certain expected time), then it issues a new complaint which is processed similar to the other types of complaints as described above.

During the view change process there may be some clients who send their request but it will not be processed because replicas are busy. To address this as we mentioned earlier the client will broadcast its request after epoch time , if it did not receive the response from . In such case, all replicas receive the request and forward it to the . Upon receipt of such forwarded requests, considers to be included in the message as soon as possible. will have to propose those backlogged requests before proposing the new requests it receives. If it proposes a request that has not been seen by replicas (of which replicas are correct/honest replicas) proposing the backlogged transactions then the replicas can send a complaint which will result in a view change.

1 Select new primary Send containing latest local block number to Receive aggregated s from if  contains at least s then
2       Get the latest block number () that has been signed by at least replicas in if latest block in replica is same as  then
3             Replica has not lost any block
4      else
5             Receive messages (blocks and their respective hashes) up to from before timeout
6       end if
7      Once updated () send to Receive from containing aggregated histories of at least replicas
8 end if
Algorithm 5 Replica View Change
1 Receive messages from replicas Aggregate at least messages into Broadcast to replicas Get the latest block number () that has been signed by at least replicas in if latest block in is same as  then
2      New primary has not lost any block
4       Receive messages (blocks and their respective hashes) up to from replicas that are up to date
5 end if
Send messages with missing blocks and hashes to all replicas who have fallen behind, , where should not be less than latest water mark Once received updated from each replica , where , aggregate into Broadcast
Algorithm 6 New Primary View Change

Iv Checkpoints

As an optimization to the protocol, we use checkpoints to improve on the number of messages exchanged during view change. Checkpoints are typically used as a way to truncate the log in other BFT-based protocols [9]. In addition to that, we can also use it to prevent malicious replicas from downloading older messages from a new primary and delaying the completion of the view change process. As we know from Section III-B, some correct replicas might miss messages and go into recovery mode. These replicas need to download those missing messages. But malicious replicas might try to download very old blocks and delay the view change process. To bound this we use checkpoints. To maintain the safety condition it is required that at least replicas agree on the checkpoint. The checkpoint is created after a constant number of blocks (e.g., sequence number divisible by 200). In Musch, replicas can agree on checkpoints during block agreement (checkpoint number to be added to the message). A checkpoint that is agreed upon by replicas of which at least are honest is called a stable checkpoint. Checkpoints have low and high watermarks. Low watermark is the last stable checkpoint and the high water mark is the sum of low water mark and number of blocks(), where is large enough (i.e. ). If a replica wants to download a block older than , will ignore the download request and might think that the replica is maliciously trying to delay the view change process.

V Correctness Analysis

In this section we provide proof of correctness and analysis of the Musch protocol. Before we proceed, it is important to define transaction completion and protocol correctness for the Musch protocol. We say that a transaction issued by a client is considered to be completed by if receives at least valid messages. It is guaranteed that upon receipt of messages from different replicas at least of them are valid. We will prove that Musch satisfies the following correctness criteria:

Definition 1 (Liveness).

Every transaction proposed by the correct client will eventually be completed in finite time.

Definition 2 (Safety).

A system is safe if a correct primary proposes a block of ordered transactions with block number and it is committed by at least replicas, then any block that has been committed earlier will have smaller block number () in the chain. Thus, block will be the prefix of block in the chain. Additionally the order of transactions within the block will remain identical in all correct replicas (due to Merkle tree111Merkle trees are hash-based data structures in which each leaf node is hash of a data block and each non leaf node is hash of its children. It is mainly used for efficient data verification.).

V-a Safety

Lemma 1.

Any two committed blocks and must have a different block number.


Consider committed blocks and . At least a set of replicas have agreed to all transactions with and have committed it. Similarly, at least a set of replicas have agreed for the transactions in block and committed it. Since there are replicas, there is at least one correct replica (out of the at least replicas in ) that committed both for and . But a correct replica only commits one block with a specific block number. Thus, both blocks must have different numbers. The same mechanism applies during recovery mode. ∎

Lemma 2.

If block commits earlier than block , then has a smaller block number than .


As per Lemma 1, at least one correct replica has committed both and . Suppose, that gets a block number which is smaller than the block number of , that is ( from Lemma 1). A correct Replica will only accept if is consistent with its local history (only if ). ∎

Lemma 3.

Musch is safe during view change.


During a view change (Algorithms 5 and 6), all replicas including the new primary retrieve the latest history and block number as at least replicas will agree on the latest block number , which includes a correct replica that knows . All correct replicas know the latest block in the history of from . If then begins updating all other replicas that have fallen behind in history, in other words it updates all the replicas that do not have blocks and respective messages up to . If does not have as its latest block then at least correct replicas know about it (from ) and they send missing blocks and their respective messages to and then updates other replicas as described above. Once updated (receive blocks and s up to ) other replicas it will take timeout period to receive (update confirmation) from at least replicas. Then, signs all their histories using an aggregated signature and broadcasts it to the replicas. Upon receipt of each replica is now ready for the new epoch of the next block and is waiting to receive an message from the new primary . ∎

Theorem 4 (Safety).

Musch is safe.


Lemma 3 guarantees safety when the new primary is correct. If is not correct, safety will be guaranteed when eventually a correct primary will be chosen. Therefore, based on Lemmas 1, 2 and 3, Musch is safe when replicas are either in normal, recovery, or view change mode. ∎

V-B Liveness

In this section we provide a proof for liveness of Musch.

Lemma 5.

Musch satisfies liveness when the primary is correct.


Consider a correct primary that executes Algorithm 1, and also the replicas that execute Algorithm 2. Primary receives at least correct messages from replicas, aggregates and signs them using an aggregation signature . It then broadcasts the signed message to all replicas. Upon receipt of the message each replica will commit the block. The primary along with all correct replicas also forwards a reply message to each client and clients will mark the transaction as completed. ∎

Lemma 6.

If there are complaints, or there is a complaint with a proof of maliciousness against the primary, then a view change will occur.


Algorithm 3 guarantees that, in the worst case, a replica can find a window node to complain, where, and contains at least one correct replica, since contains at least nodes. Observe that once a replica has found a honest window node, it is guaranteed that the honest node will reply to its valid complaint either by sending back blocks and s or if the number of complaints are greater than , then the window node will broadcast all complaints to the network causing a view change (Algorithm 4).

If a replica receives at least complaints from other replicas it triggers a view change according to the Algorithm 4. Since complaints are received, this guarantees that at least one honest replica has complained.

Similarly, may receive an explicit proof that the primary is faulty (’s history is incorrect, or it has proposed an invalid transaction, etc.). In such a case only one complaint is needed to prove that is malicious and a view change will be triggered. ∎

Lemma 7.

If a transaction is not completed then a view change will occur.


If a transaction does not complete after sufficient time , then the client broadcasts its transaction to the replicas. Upon receipt of , the replicas check if they have already committed a block that contains . If they did, each replica will send to the client and upon receipt of messages the client will consider the transaction as complete. If primary has not proposed the transaction , then each replica will forward to and will expect that will include it in the next message (during normal operation). If does not include it in the next message, then replicas will start complaining, which will result in a view change (if at least replicas complain, from Lemma 6).

Another case that can prevent a request from being committed is when replicas receive a message signed by less than replicas. In this case, this can be used as proof against and a complaint can be made, which will result in a view change (Lemma 6). ∎

Lemma 8.

Musch satisfies liveness even if a client request is received during a view change.


During the view change process, there may be some clients who send their request for transaction but it will not be processed because replicas are busy with the view change. To address this, as mentioned earlier the client will broadcast its request after epoch timeout , if it did not receive a response from . In such a case, all replicas receive the request and forward it to the new primary . Upon receipt of such forwarded requests the considers to be included in the message as soon as possible. The new primary will have to propose those backlogged client requests during the view change, before proposing the new requests it receives. If it proposes a request that has not been seen by replicas (of which replicas are correct/honest replicas), proposing the backlogged transactions then the replicas can start complaints, which will result in a new view change (Lemma 6). ∎

Theorem 9 (Liveness).

Musch satisfies liveness and all correct transactions will be completed eventually.


Based on Lemmas 5, 7 and 8, any correct transaction request by a client will be completed within a finite period of time. ∎

Vi Communication Complexity

In communication complexity, we count all messages that cause a reaction in our algorithm and we refer to these as effective messages. In contrast, there are ineffective messages, which have sources that have been identified as malicious, and so the recipient can ignore these messages. We will measure the number of effective messages exchanged in an epoch, and we will consider worst cases scenarios, with or without view change. In other words, we consider worst-case performance attacks when malicious replicas attempt to increase the communication of the protocol by causing messages to be sent from correct replicas.

In the communication complexity we consider separately the messages sent between clients and replicas, and those sent only between replicas.

Vi-a Client-Replica Communication Complexity

If a client sends a transaction to the primary , and does not receive a response from the primary within , then the client broadcasts to the primary (a broadcast involves messages). Upon receipt of a broadcast from a client, if replica has already processed the client’s transaction it will answer to the client with an acknowledgement. If not, the replica will forward the client’s request to the primary, forcing it to process it as soon as possible. The liveness property of our algorithm, Theorem 9, will guarantee that eventually at least of the replicas will send acknowledgements to the client. Therefore, we get the following result:

Lemma 10.

For each transaction sent by a client, at most messages will be exchanged between the client and the replicas in order to process the transaction (i.e., include the transaction in a block).

Vi-B Replica-Replica Communication Complexity

In this section we analyze the communication complexity of the consensus engine of our protocol, which includes the primary and the replicas (in total nodes). A malicious primary and malicious replicas both can try to increase the communication complexity.

Vi-B1 Messages caused by malicious primary

Let be the set of replicas that complain. First, we examine the case when the nodes in did not receive the block or message and they complain. A malicious primary can afford not to send such messages up to at most replicas, without getting caught as being malicious; that is, .

In this case, each of the complainers in may have to communicate with up to window nodes, since this guarantees a window that has at least one correct window node. This gives at most messages. In the worst case, out of the window replicas at most will be the honest ones that will broadcast to all replicas and will receive their response, to be forwarded to the complainers , giving at most additional messages. The total communication complexity in this case will be (since ):


Vi-B2 Messages caused by malicious replicas

Suppose the set of complainers are malicious, thus, . Window nodes do not respond to repetitive complains from the same replica (non-effective messages), which prevents malicious replicas from increasing the communication complexity. Nevertheless, each window node may respond once to each malicious request. A window node can respond to a complain message in the following ways:

  • If window node has the appropriate response to the complain (i.e. it has the block or ) it will send it back to the replica that complained. At most window nodes will be accessed by each replica in , since this is the bound on the total number of window nodes. Therefore, in this case, the number of messages are at most:

  • If window node does not have the appropriate response (block or ), then itself is also executing the window protocol from smaller to larger windows, and when it eventually points to its own window, it will broadcast the complaint to get a response from other replicas (acting as a regular window node). This scenario can only happen if all the previous windows are populated by faulty nodes. The number of complaints from to up to window nodes are bounded by . Similarly, the respective responses are bounded by . For calculating the messages from the broadcasts, out of the total window nodes, at most window nodes will react to the received complaints with broadcasts, since the first encountered window of size at least will respond to any complaint from a valid node. Thus, each of the up to windows nodes broadcasts to all replicas, causing additional messages. Therefore, in this case, the number of messages are at most:


Vi-C View Change Communication Complexity

When a correct window node receives complaints it will broadcast all of them to all replicas ( messages). There are at most window nodes that will broadcast (since those window nodes could be correct in the last accessed window size), resulting to at most messages. Upon receipt of the broadcast message, each replica begins the view change process. The replica sends back a message to the new primary which also includes its history ( messages). The new primary aggregates all messages into and broadcasts ( messages). Upon receipt each replica extracts the most recent block as described in Section III-C. Therefore, the number of messages from this part of the algorithm is at most:


During this, at least correct replicas have the latest committed block , and this block is chosen as the starting point for the next epoch, which will build another block () over it. All other replicas that have block number less than as their latest block have to download all the blocks up to from . If does not have as its latest block, then replicas that have it will bring up to date. Thus, if replicas have as their latest block, then, at most replicas in the worst case get (download) messages up from the high water mark in checkpoint to . Let be the number of committed blocks from to . For each committed block we need two messages, first the block itself and the second is the message. Thus, we have:


Assuming frequent checkpoints (say every a fixed number of blocks), we can assume that is a constant. From Equations 4 and 5 we have for the total number of messages in view change:


Vi-D Overall Messages

Combining Equations 1, 2, 3, we obtain communication complexity in a single epoch for the communication complexity between replicas. From Equation 6 the communication complexity is also during view change. Therefore, we have the following result:

Lemma 11.

The number of messages exchanged between replicas in an epoch or during view change are .

Combining Lemmas 10 and 11 we obtain the main result for the communication complexity:

Theorem 12 (Communication complexity).

For initiated transactions in an epoch, the communication complexity is . For constant , the communication complexity is .

Vii Conclusions

In this paper we proposed Musch, a BFT-based consensus protocol, in an effort to avoid excessive messages and improve the scalability of blockchain algorithms. Through the use of windows, the algorithm adapts to the actual number of faulty nodes , and in this way it avoids unnecessary messages. This improvement does not sacrifice on the latency, since our algorithm still uses a small number of communication rounds. For future work, it would be interesting to investigate whether we can decrease the message complexity further, i.e. to under faults, by introducing an intelligent scheme to detect faulty nodes and foil attempts to increase message complexity.


  • [1] S. Nakamoto, “Bitcoin: A peer-to-peer electronic cash system.” [Online]. Available:
  • [2] M. Vukolic, “The quest for scalable blockchain fabric: Proof-of-work vs. BFT replication,” in Open Problems in Network Security - IFIP WG 11.4 International Workshop, iNetSec 2015, Zurich, Switzerland, October 29, 2015, Revised Selected Papers, 2015, pp. 112–125.
  • [3] K. J. O’Dwyer and D. Malone, “Bitcoin mining and its energy footprint,” in 25th IET Irish Signals Systems Conference 2014 and 2014 China-Ireland International Conference on Information and Communications Technologies (ISSC 2014/CIICT 2014), June 2014, pp. 280–285.
  • [4] D. G. WOOD, “Ethereum: A secure decentralised generalised transaction ledger,” pp. 1–33, 2017. [Online]. Available:
  • [5] I. Eyal, A. E. Gencer, E. G. Sirer, and R. V. Renesse, “Bitcoin-ng: A scalable blockchain protocol,” in 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16).   Santa Clara, CA: USENIX Association, 2016, pp. 45–59.
  • [6] L. Lamport, “Using time instead of timeout for fault-tolerant distributed systems.” ACM Trans. Program. Lang. Syst., vol. 6, no. 2, pp. 254–280, Apr. 1984.
  • [7] R. Kotla, A. Clement, E. Wong, L. Alvisi, and M. Dahlin, “Zyzzyva: Speculative byzantine fault tolerance,” Commun. ACM, vol. 51, no. 11, pp. 86–95, Nov. 2008.
  • [8] R. Guerraoui, N. Knežević, V. Quéma, and M. Vukolić, “The next 700 bft protocols,” in Proceedings of the 5th European Conference on Computer Systems, ser. EuroSys ’10.   New York, NY, USA: ACM, 2010, pp. 363–376.
  • [9] M. Castro and B. Liskov, “Practical byzantine fault tolerance,” in Proceedings of the Third Symposium on Operating Systems Design and Implementation, ser. OSDI ’99.   Berkeley, CA, USA: USENIX Association, 1999, pp. 173–186.
  • [10] J. Liu, W. Li, G. O. Karame, and N. Asokan, “Scalable byzantine consensus via hardware-assisted secret sharing,” CoRR, vol. abs/1612.04997, 2016.
  • [11] G. Golan-Gueta, I. Abraham, S. Grossman, D. Malkhi, B. Pinkas, M. K. Reiter, D. Seredinschi, O. Tamir, and A. Tomescu, “SBFT: a scalable decentralized trust infrastructure for blockchains,” CoRR, vol. abs/1804.01626, 2018. [Online]. Available:
  • [12] M. K. Reiter, “Secure agreement protocols: Reliable and atomic group multicast in rampart,” in Proceedings of the 2nd ACM Conference on Computer and Communications Security, ser. CCS ’94.   New York, NY, USA: ACM, 1994, pp. 68–80.
  • [13] D. Boneh, C. Gentry, B. Lynn, and H. Shacham, “Aggregate and verifiably encrypted signatures from bilinear maps,” in Proceedings of the 22nd International Conference on Theory and Applications of Cryptographic Techniques.   Berlin, Heidelberg: Springer-Verlag, 2003, pp. 416–432.
  • [14] L. Lamport, R. Shostak, and M. Pease, “The byzantine generals problem,” ACM Trans. Program. Lang. Syst., vol. 4, no. 3, pp. 382–401, Jul. 1982.
  • [15] L. Luu, V. Narayanan, C. Zheng, K. Baweja, S. Gilbert, and P. Saxena, “A secure sharding protocol for open blockchains,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’16.   New York, NY, USA: ACM, 2016, pp. 17–30.
  • [16] M. J. Fischer, N. A. Lynch, and M. S. Paterson, “Impossibility of distributed consensus with one faulty process,” J. ACM, vol. 32, no. 2, pp. 374–382, Apr. 1985.
  • [17] C. Dwork, N. Lynch, and L. Stockmeyer, “Consensus in the presence of partial synchrony,” J. ACM, vol. 35, no. 2, pp. 288–323, Apr. 1988.