## 1 Introduction

There have been an upsurge of interests in cryptocurrencies and distributed ledger technologies since the success of Bitcoin. At its core, a blockchain system relies on a consensus protocol that ensures all nodes in the network agree on a single chain of transaction history. The functioning is guaranteed, regardless of any possible adverse influence of malfunctioning and malicious nodes. The underlying blockchain technologies have seen a vast amount of interests for business and innovation, with applications ranging from logistics, healthcare, and smart cities.

Fault-tolerant consensus has been extensively studied in distributed systems [4, 5, 2, 13]. *Byzantine* fault tolerance (BFT) [11] is tolerant up to a third of the participant machines in failure. In BFT systems, reaching agreement is crucial to guarantee consistency of the continuous replication of distributed state machine across the network.
Byzantine nodes refer to participant processes (or machines) which may be at fault, get disconnected or some of them can be adversary.
A Byzantine failure is often caused by a malfunctioning or malicious process [11]. If multiple Byzantine components co-exist, they may collude to cause more damage to the network. Byzantine fault is considered the most severe and challeging to deal with, and crash failure is often considered a benign case.

In spite of different states in distributed processes, a BFT consensus algorithm guarantees all (honest) processes agree on common states and common data values. Services built on top of these BFT algorithms can guarantee that all honest nodes will perform same sequence of actions regardless of faulty components and unreliable communication links. This consensus guarantee is crucial in distributed BFT systems. Consensus algorithms, which guarantee transaction integrity over the distributed network, are equivalent to the proof of BFT [3, 10]. Practical BFT (pBFT) [5] can reach a consensus for a block once the block is shared with other participants and the share information is further shared with others [9, 12].

Most of the previous work in BFT systems have mainly focused on *common knowledge* or *common state* of the states of all participant processes in the network [15, 8]. This is fundamentally important in BFT systems, to know about which knowledge a process has learned about the other processes’ states, as well as the global state of the system. However, there is a gap in justifying about common knowledge of Byzantine faults in a BFT system, such as, whether existence of a Byzantine node is known by all honest nodes. Giv en the dynamic nature of a realistic network, processes may fail or get compromised unexpectedly and unpredictably. Thus, it is critical to reason about which processes know about the faulty processes of the network.

To motivate our study, we start with the following problem, which describes a simple version of Byzantine fault detection problem.

### 1.1 The cheater problem

In a remote island, there are villagers living together. Each day, all villagers come out to collect fruits around the island and carry them back in a box back to the village. Each villager individually counts the number of fruits in the box and records their count on the box. They can then check if there is a count that majority of them agree on. Then s/he can find out whether there is a cheater(s). If a cheater is found, each villager then gossips the new finding to all villagers on the same day.

The objective of the problem is to know if the villagers can discover and agree on the existence of every cheater in the village. Previous work has mainly focused on ensuring that all honest villagers do know the correct number of fruits (i.e., a common knowledge of states of all processes). There is a lack of understanding if they agree whether all cheaters are known by the honest villagers. It also remains unknown whether the common knowledge about cheaters will be possible if cheaters have different probability of cheating.

### 1.2 Our approach

In order to study about the Cheater problem, we investigate a model of BFT system in which individual cheaters may have different probabilities of cheating. There are two types of nodes in a distributed system: Zenta (honest) nodes and Byzantine (cheating) nodes. In our new approach, we assume that every Byzantine (cheater) has a probability of fault (cheating), whereas every Zenta nodes have a zero probability. Based these probabilities, we then study whether the network will ultimately discover all Byzantine processes (cheaters).

In particular, this paper introduces a semi-formal model of the cheater problem.
We use some terminologies of network, process and common knowledge, as in previous work [15].
A *process* is a participant machine or node of the network. There are two types of processes: (1) Zenta process is a honest process (denoted by Z); (2) Byzantine process is a malfunctioning or malicious process (denoted by B).
Unlike previous work, we consider a general case of Byzantine processes, which can misbehave randomly and unpredictably. These are so-called *probabilistic Byzantine* (PB) processes.
In its most general form, our model considers each process (whether Z or B) has a probability of cheating [0,1]. A Z process has = 0, i.e., its probability of cheating is zero. In contrast, a B or PB process has a positive probability of cheating, e.g., .

We then present our study of the proposed model in both synchronous and asynchronous BFT systems. We show how the model can help reason about common knowledge of the probabilistic Byzantine processes by all process in the network. Interestingly, our study of the model has shown that a PB with a higher probability of cheating is more likely to be detected than other PBs with a lower probability.

The rest of the paper is organized as follows. Section 2 gives related work. Section 3 decribes our approach to analyse the common knowledge of all processes with respect to the cheating values . Section 4 gives our study of asynchronous BFT systems using our approach. Section 5 covers several discussions followed from our studies. Conclusion is given in Section 6.

## 2 Related work

This section gives related work on fault-tolerant consensus and Byzantine fault-tolerance in distributed systems.

### 2.1 Byzantine fault tolerance in trustless systems

A distributed system is often comprised of physically distinct entities, which are also named nodes, processors, agents or sensors.
They are geographically separated and each of them has only a partial knowledge of the system. The term *process* is commonly used to denote any computing entity. The system is functional if every processes of a system exchange information, and can reach agreement with each other on certain data values to achieve a common goal.
The system can face unpredictable faults and adversarial influence due to faulty processes and unreliable communication channels.

Byzantine fault tolerance has been studied extensively. Examples include Practical BFT (pBFT) [5], Paxos [10], Zyzzyva [9], Q/U [1], HQ [7], just to name a few. Two types of failures are crash failure and Byzantine failure. In Byzantine failure, the process may act arbitrarily, send contradictory messages to peers or simply remain silent. Whilst a crashed process stops functioning completely and does not resume. BFT systems can be synchronous [2], partially asynchronous, or asynchronous [13].

For asynchronous distributed systems [4, 6, 14], it is challenging due to the very nature of distributed computing in that each process knows only a partial knowledge of the system, and none can capture instantaneously the global state of the system. This is because of geographical distance between the processes and presence of uncertainty in asynchrony and failures. Theoretically, a crashed process is impossibly distinguished from a very slow processs in an asynchronous BFT system.

BFT protocols have been used in a wide range of applications including replicated file system, backup, and block stores. Many of them guarantee *safety* and *liveness*, even though arbitrarily Byzantine replicas may exist. The safety property *linearizability* ensures a sequential order of execution of the requests as seen by clients, whereas the liveness property ensures that all valid requests from clients are eventually executed.

### 2.2 Knowledge and Common knowledge

There are several work that give a fundamental understanding about knowledge and common knowledge in distributed systems. CCK paper [15] defines a formal model of concurrent common knowledge, which is used to study in asynchronous systems. Common knowledge is also studied in Byzantine environment [8].

Byzantine consensus considers the problem of reaching agreement among a system of processes , , (). The processes communicate by sending messages to each other. Each process has an initial value , and it has to decide on a value at each step. A BFT system may contain up to Byzantine processes which may deviate from the protocol in an arbitrary manner. Regardless of the determinism of the protocol, several factors can cause non-determinism in the system. Processes may vary in speeds, unable to determine the order of originated messages that they received, and due to the arbitrary behavior of faulty processes.

A protocol reaches consensus if it satisfies that every honest process decides on the same value, after a finite number of steps.
There are two theoretical models for the Byzantine consensus problem; they are *Byzantine broadcast* and *Byzantine agreement* [16, 11]. In Byzantine broadcast, a designated sender tries to broadcast a value to the processes; whereas in Byzantine agreement, every process holds an initial input value. Byzantine agreement requires under partial synchrony or asynchrony even with digital signatures but can be achieved with under synchrony.

For Byzantine agreement [11, 17]

, each process initiates a Byzantine broadcast to send its value to peers in parallel. After the broadcast, every honest process will share the same vector of values

= {,,,}, where is the input value of an honest process . If all honest proceses start with the same input value , then will be the most frequent in , achieving validity condition. Agreement is reached since all honest processes share the same .## 3 Common Knowledge: Probabilistic Byzantine Fault Tolerance

In this section, we present our approach to study BFT systems in which cheaters have probability of cheating.

### 3.1 Problem definition

A network is comprised of processes , where . Each process can also be denoted by . The processes communicate with each other via messages. We consider a BFT system with Bazyntine processes and 1- Zenta processes. The upper bound of may vary between different systems; for example, in some systems and in others.

At each day , there is a question given to all processes and each process needs to give a binary answer . Assume that all processes except Byzantine ones are honest and well-behaved. They can always give correct answer, i.e., . However, some process in may sometimes give a wrong answer, e.g., = , with a probability of where . For each honest process , its is 0. Whilst a Byzantine process has a greater than zero.

### 3.2 Answer vector and common state

At day , a question is given to the processes. The answers from the processes are given by the vector: . The common answer, which supermajority agree on, is guaranteed the correct answer by BFT systems. Any voter who gives an answer that is different from the common answer is a cheater.

Let be the mean value of the vector . There are two cases in 1/3-BFT systems: (a) if supermajority of the answers is 1, then ; (b) if supermajority of the answers is 0, then . Hence, a process is a cheater if the difference . Thus, every process can deduce from the common answer, which is computed from the vector , to see whether other processes including him/herself are cheaters.

### 3.3 Know relation

Let denote that is cheating at day . For simplicity, in this section, we consider the cheater has the same probability of cheating the same every day. That is The equals to .

Let denote the certainty level that process knows that process be a Bazyntine process. We make no assumption about whether a process knows itself is Bazyntine or not.

In the first day, the probability that process knows process is a cheater is . On the -th day, process knows that process is a cheater or not with a certainty that equals to:

(1) |

As increases, reduces toward zero because . Thus, tends to 1 after sufficiently large days.

### 3.4 Cheating Detection

We now present a matrix to capture the know relation for every pair of processes.

Let matrix denote who-knows-whom matrix, in which each value at row and column is the value of . The value at is equal to , which is the certainty level that process knows that process is a Bazyntine.

On day one, the matrix is given by:

where is the matrix, whose -th column contains all 1’s.

On the -th day, the matrix becomes:

Since , tends to 0 as increases. Thus, tends to 1 for every cheater . That is, for every cheater , the column of has all 1’s. Remarkably, it shows a common knowledge that every cheater is known by all processes for some sufficiently large value of (days).

## 4 Extended study: Probabilistic Byzantine Fault Tolerance in aBFT systems

In this section, we extend our study of the cheater problem in asynchronous BFT systems. We first show a motivation example of the asynchronous cheater problem that captures a simple version of Byzantine fault detection problem in asynchronous system. We then show our formulation to study common knowledge of Byzantine processes who probabilitically cheats.

### 4.1 The asynchronous cheater problem

Assume there is an village of villagers in a remote island. On the first day, a group of villagers come out to collect fruits and brings them back in a box to the village. Each villager counts the number of fruits they collected and individually records their count on the box. On the 2nd day, another group of villagers will do fruit picking and bringing the fruits back, and they invididually writes down their count. Each of them then finds the boxes collected in previous days and writes his/her count, if it was not done previously. And so on. Each villager once writes down their count, they can check if there is a count that majority of them agrees on, and hence s/he can find out whether any of the is honest or cheating. If a cheater is found, that villager can gossip the new finding with all villagers on that day

Remarkably, the asynchronous version is different from the previous version of the cheater problem (described in 1.1). The difference is that on every day, the box of that day only has () counts, whereas previous version all counts are done on the same day.

### 4.2 Problem definition

Given a network of processes with honest processes and Byzantyne processes. At each day, a group of processes are selected; and each of them is given a question of the day and is required to give a binary answer . They also give their answer to the questions of previous days that they havenot answered yet. Each process may give an wrong answer, e.g., = , with a probability of where . Honest processes always give correct answer, say = .

For the sake of simplicity of formulation, we assume that each processes are labeled with an index. Processes of the selected group will take turn (based on their index) to write their answer to the question(s).

We define a notion of *round* to indicate when a question from the sequence has been answered by all the processes.
With this definition, the formulation can be expressed similarly to the previous version, except we replace day by round.

At round , a question has the answers from all the processes. The answers are given by the vector:

Each process can deduce from the vector the common answer agreed by the supermajority. Each process can then figure out whether there are cheaters including him/herself.

### 4.3 Know relation

Let denote that is cheating at round . We assume that the a cheater is cheating with the same probability on every day and every round, that is . Let denote the certainty level that process knows that process be a Bazyntine process. Presumably, a process may not know whether itself is Bazyntine or not.

In the first round, the probability that process knows process is a cheater is . At round , process knows that process is a cheater equals to

As increases, reduces toward zero, and so tends to be 1.

### 4.4 Cheating Detection

Similar to the previous version of synchronous cheater problem, eventually all cheaters are found out.

At round , the matrix becomes:

As increases, tends to 0 for every cheater . Thus, the column of has all 1’s.

## 5 Discussions

In this section, we give a discussion about our approach with respect to the chance of detection of those Byzantine processes. We also discuss about how we can model in case the Byzantine processes vary in their probability every day (round).

### 5.1 Probability of Detection

Let and denote two Byzantine processes. Let and be their probability of being Byzantine, respectively. The chance of being discovered for each of them by a process at day (round) are given by: , and , respectively. Hence, the difference of their chances is given by:

(2) | ||||

Since , the sign of depends on the sign of . Without loss of generality, we assume that . This gives , and so from the above equation. Thus, .

Remarkably, we have shown that a Byzantine process with a higher cheating probability has a higher chance of being caught every single day (round) in synchronous (asynchronous) BFT systems.

### 5.2 Randomized Byzantine processes

So far, we have assumed that individual Byzantine process has the same cheating probability every day (round). We now consider a more general problem in which each Byzantine process has a varying probability of cheating over time.

Let denote the probability of Byzantine fault of Byzantine process at day (round) . The chance of being discovered for process by a process at day (round) is given by:

(3) |

As the number of days (rounds) increases, approaches to 0. Thus, the value of tends to 1.

#### 5.2.1 Probability of Detection

For two processes and , the difference of the chance of being detected is given by:

(4) | ||||

If cheater is more likely to cheat than cheater , e.g, for all , then we can deduce from the above equation that . Hence, we can come up with a similar conclusion that a randomized Byzantine process with a higher cheating probability has a higher chance of being caught every single day (round) in synchronous (asynchronous) BFT systems.

## 6 Conclusion

In this paper, we show that the ability to reason about common knowledge of Byzantine processes is crucial in BFT systems. We have presented an approach that uses a semi-formal model to compute the probability of whether the existence of a Byzantine process becomes a common knowledge of all processes. We have addressed the Byzantine fault detection problem using a matrix form of probabilities. This is an important problem to ensure the substainability of trustless systems.

We have found several interesting properties of common knowledge of probabilistic Byzantine processes from the study of our model. As time goes by (either number of days in synchronous case, or number of rounds in asynchronous case), all cheaters in the network are not only be detected, but also their existence is proven a common knowledge for all processes. Intuitively, we have also shown that the higher the probability of cheating by a process, the higher chance that process being found by the network.

## 7 Reference

- [1] M. Abd-El-Malek, G. R. Ganger, G. R. Goodson, M. K. Reiter, and J. J. Wylie. Fault-scalable byzantine fault-tolerant services. ACM SIGOPS Operating Systems Review, 39(5):59–74, 2005.
- [2] I. Abraham, S. Devadas, D. Dolev, K. Nayak, and L. Ren. Efficient synchronous byzantine consensus. arXiv preprint arXiv:1704.02397, 2017.
- [3] J. Aspnes. Randomized protocols for asynchronous consensus. Distributed Computing, 16(2-3):165–175, 2003.
- [4] C. Attiya, D. Dolev, and J. Gil. Asynchronous byzantine consensus. In Proceedings of the third annual ACM symposium on Principles of distributed computing, pages 119–133. ACM, 1984.
- [5] M. Castro and B. Liskov. Practical byzantine fault tolerance. In Proceedings of the Third Symposium on Operating Systems Design and Implementation, OSDI ’99, pages 173–186, Berkeley, CA, USA, 1999. USENIX Association.
- [6] S.-M. Choi, J. Park, Q. Nguyen, and A. Cronje. Fantom: A scalable framework for asynchronous distributed systems. CoRR, abs/1810.10360, 2018.
- [7] J. Cowling, D. Myers, B. Liskov, R. Rodrigues, and L. Shrira. Hq replication: A hybrid quorum protocol for byzantine fault tolerance. In Proceedings of the 7th symposium on Operating systems design and implementation, pages 177–190, 2006.
- [8] C. Dwork and Y. Moses. Knowledge and common knowledge in a byzantine environment i: crash failures. In Theoretical Aspects of Reasoning about Knowledge, pages 149–169. Elsevier, 1986.
- [9] R. Kotla, L. Alvisi, M. Dahlin, A. Clement, and E. Wong. Zyzzyva: speculative byzantine fault tolerance. ACM SIGOPS Operating Systems Review, 41(6):45–58, 2007.
- [10] L. Lamport et al. Paxos made simple. ACM Sigact News, 32(4):18–25, 2001.
- [11] L. Lamport, R. Shostak, and M. Pease. The byzantine generals problem. ACM Trans. Program. Lang. Syst., 4(3):382–401, July 1982.
- [12] A. Miller, Y. Xia, K. Croman, E. Shi, and D. Song. The honey badger of bft protocols. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 31–42. ACM, 2016.
- [13] A. Mostéfaoui, H. Moumen, and M. Raynal. Signature-free asynchronous binary byzantine consensus with t¡ n/3, o (n2) messages, and o (1) expected time. Journal of the ACM (JACM), 62(4):31, 2015.
- [14] Q. Nguyen, A. Cronje, M. Kong, A. Kampa, and G. Samman. StakeDag: Stake-based Consensus For Scalable Trustless Systems. CoRR, abs/1907.03655, 2019.
- [15] P. Panangaden and K. Taylor. Concurrent common knowledge: defining agreement for asynchronous systems. Distributed Computing, 6(2):73–93, 1992.
- [16] M. Pease, R. Shostak, and L. Lamport. Reaching agreement in the presence of faults. Journal of the ACM (JACM), 27(2):228–234, 1980.
- [17] R. Rodrigues, M. Castro, and B. Liskov. Base: Using abstraction to improve fault tolerance. In ACM SIGOPS Operating Systems Review, volume 35, pages 15–28. ACM, 2001.

Comments

There are no comments yet.