The fault-tolerant consensus problem proposed by Lamport et al.  has been studied extensively under different point-to-point network models, including complete networks (e.g., [32, 19, 1]) and undirected networks (e.g., [20, 17]). Recently, many works are exploring various consensus problems in directed networks, e.g., [11, 8, 9, 27, 13], including our own work [38, 40, 36]. More precisely, these works address the problem in incomplete directed networks, i.e., not every pair of nodes is connected by a channel, and the channels are not necessarily bi-directional. We will often use the terms graph and network interchangeably. In this work, we explore the crash-tolerant approximate consensus problem in asynchronous incomplete networks under different restrictions on topology knowledge – where we assume that each node knows all its neighbors of at most -hop distance – and relay depth – the maximum number of hops that information (or a message) can be propagated. These constraints are common in large-scale networks to avoid memory overload and network congestion, e.g., neighbor table and Time to live (TTL) (or hop limit) in the Internet Protocol. We consider both undirected and directed graphs in this paper.
Motivation Prior results [38, 13] showed that exact crash-tolerant consensus is solvable in synchronous networks with only one-hop knowledge and relay depth , i.e., each node only needs to know its immediate incoming and outgoing neighbors, and no message needs to be relayed (or forwarded). Such a local algorithm is of interest in practice due to low deployment cost and low message complexity in each round. In asynchronous undirected networks, there exists a simple flooding-based algorithm adapted from [20, 17] that achieves approximate consensus with up to crash faults if the network satisfies node-connectivity111For brevity, we will simply use the term “connectivity” in the presentation below. and , where is the number of nodes. However, these two conditions are not sufficient for an iterative algorithm with one-hop knowledge and relay depth , in which each node maintains a state and exchanges state values with only one-hop neighbors in each iteration.
Consider Figure 0(a), which is a ring network of four nodes. There is no iterative algroithm with one-hop knowledge and relay depth under one crash fault. The adversary can divide the nodes into disjoint sets and such that the communication delay across sets is so large that thinks has crashed, and thinks has crashed, and similarly for the pair and . As a result, no exchange of state values is possible across the sets in the execution; hence, consensus is not possible (a more precise discussion in Section 3). On the other hand, suppose each node has two-hop knowledge, i.e., a complete topology knowledge in this network, and relay depth . Then knows that it will be able to receive state values from at least two of the other nodes since the node connectivity is , and up to one node may fail. Following this observation, it is easy to design a flooding-based algorithm in the ring network based on [20, 17]. This example shows that both topology knowledge and relay depth affect the feasibility of asynchronous approximate consensus.
Interestingly, increasing connectivity alone does not make iterative algorithm feasible. In Section 5.1, we show that no fault-tolerant approximate consensus algorithm with one-hop topology and relay depth exists in the network in Figure 0(b), which has two sparsely-connected cliques of size and connectivity . Motivated by these observations, this work addresses the following question in asynchronous systems:
What is a tight condition on the underlying communication graphs for achieving approximate consensus if each node has only a -hop topology knowledge and relay depth ?
Approximate Consensus We focus on the asynchronous approximate consensus problem. The system consists of nodes, of which at most nodes may crash. Each node is given an input, and after a finite amount of time, each fault-free node should produce an output, which satisfies validity and agreement conditions (formally defined later). Intuitively, the state at fault-free nodes must be in the range of all the inputs, and are guaranteed to be within of each other for some after a sufficiently large number of rounds.222In the literature, it is also called asymptotic consensus. Here, we use the term “approximate consensus” following the work [19, 38]
In , we presented Condition CCA (definition in Section 2) and showed that it is necessary and sufficient on the underlying directed graphs for achieving approximate consensus in asynchronous systems . The approximate consensus algorithms in prior work [38, 20, 17] are based on flooding (i.e., relay depth ) and assume that each node has -hop topology knowledge. However, such an algorithm in not practical in a large-scale network, since nodes’ local memory may not be large enough to store the entire network, flooding-based algorithms (e.g., [38, 20, 17]) incur prohibitively high message overhead for each phase, and complete topology knowledge may require a high deployment and configuration cost. Therefore, we explore algorithms that only require “local” knowledge and limited message relay.
Contributions We identify tight conditions on the graphs under different assumptions on topology knowledge and relay depth. Particularly, we have the following results:
Limited Topology Knowledge and Relay Depth (Section 3): We consider the case with -hop topology knowledge and relay depth . The family of algorithms that captures these constrains are iterative -hop algorithms – nodes only have topology knowledge of their -hop neighborhoods, and propagate state values to nodes that are at most -hops away. Note that no other information is relayed. For iterative -hop algorithms, we derive a family of tight conditions, namely Condition -CCA for , for solving approximate consensus in directed networks. To prove the tightness of the conditions, we propose a family of iterative algorithms called -LocWA and show how the convergence time and the message complexity of those algorithms is affected by , providing the respective upper bounds.
Topology Discovery and Unlimited Relay Depth (Section 4): We consider the case with one-hop topology knowledge and relay depth . In other words, nodes initially only know their immediate incoming and outgoing neighbors, but nodes can flood the network, learn (some part of) the topology, and eventually solve consensus based on the learned topology. We show that Condition CCA from  is also sufficient in this case. Since we assume only one-hop knowledge, our result implies that Condition CCA is tight for any -hop topology knowledge. One contribution that may be of independent interest is a topology discovery mechanism to learn and “estimate” the topology in asynchronous directed networks with crash faults. Such a discovery mechanism will be useful for self-stabilization and reconfiguration of a large-scale system.
In Section 5, we discuss fault-tolerance implications of the derived conditions and Condition CCA. We also discuss how to speed up our algorithms in terms of real time delay.
Related Work There is a large body of work on fault-tolerant consensus. Here, we discuss related works exploring consensus in different assumptions on graphs. Fisher et al.  and Dolev  characterized necessary and sufficient conditions under which Byzantine consensus is solvable in undirected graphs. In synchronous systems, Charron-Bost et al. [11, 12] solved approximate crash-tolerant consensus in dynamic directed networks using local averaging algorithms, and in the asynchronous setting, Charron-Bost et al. [11, 12] addressed approximate consensus with crash faults in complete graphs which are necessarily undirected. We solve the problem in incomplete directed graphs in asynchronous systems. Moreover, in [11, 12], nodes are constrained to only have the one-hop topology knowledge. We study different types of algorithms, including the ones that allow nodes to learn the topology (i.e., we allow topology discovery).
There were also works studying limited topology knowledge. Su and Vaidya  identified the condition for solving synchronous Byzantine consensus using a variation of -hop algorithms. Alchieri et al.  studied the synchronous Byzantine problem under unknown participants. We consider asynchronous systems in this work. Nesterenko and Tixeuil  studied the topology discovery problem in the presence of Byzantine faults in undirected networks, whereas we present a solution that works in directed networks with crash faults.
Extensive prior works studied graph properties for other similar problems in the presence of Byzantine failures, such as (i) Byzantine approximate consensus in directed graphs using “local averaging” algorithms wherein nodes only have one-hop neighborhood knowledge (e.g., [40, 39, 36, 24, 43, 42, 16]), (ii) Byzantine consensus with unknown participants , (iii) Byzantine consensus with authentication in undirected networks . These papers only consider synchronous systems, and our algorithms and analysis are significantly different from those developed for Byzantine algorithms, and (iv) consensus problems in synchronous dynamic networks where the adversary can change the network topology. In this line of work, impossibility results for Consensus and -Set Agreement are given in [7, 10] and sufficiency is guaranteed by requiring a period of stability, during which certain nodes are strongly connected; the first tight condition for the feasibility of consensus and broadcast is presented in . Additionally, in , byzantine corruptions and a dynamic node set is assumed and a -round randomized algorithm is presented. Our work is different from all these works because of the assumption of asynchronous systems and limited topology information. Please refer to our technical report  for further discussion on these works.
Before presenting the results, we introduce our systems model, some terminology, and our prior results from  to facilitate the discussion.
System Model The point-to-point message-passing network is static, and it is represented by a simple directed graph , where is the set of nodes, and is the set of directed edges between the nodes in . The communication links are reliable. We assume that , since the consensus problem for is trivial. Node can transmit messages to another node directly if directed edge is in . Each node can send messages to itself as well; however, for convenience, we exclude self-loops from set . We will use the terms edge and link interchangeably.
Up to nodes may suffer crash failures in an execution. A node that suffers a crash failure simply stops taking steps (i.e., fail-stop model). We consider the asynchronous message-passing communication, in which a message may be delayed arbitrarily but eventually delivered if the receiver node is fault-free. We assume that the adversary has both the control of crashing nodes and delaying messages at any point of time during the execution.
Terminology Upper case letters are used to name sets. Lower case italic letters are used to name nodes. All paths used in our discussion are directed paths.
Node is said to be an incoming neighbor of node if . Let be the set of incoming neighbors of node , i.e., . Define as the set of outgoing neighbors of node , i.e., .
For set , node is said to be an incoming neighbor of set if , and there exists such that . Given subsets of nodes and , set is said to have incoming neighbors in set if contains distinct incoming neighbors of . Given disjoint non-empty subsets of nodes and , if has at least distinct incoming neighbors in . When it is not true that , we will denote that fact by .
Approximate Consensus For the approximate consensus problem (e.g., [19, 26, 38]), it is usually assumed that each node maintains a state with denoting the state of node at the end of phase (or iteration) . The initial state of node , , is equal to the initial input provided to node . At the start of phase , the state of node is .
Let and be the maximum and the minimum state at nodes that have not crashed by the end of phase . Then, a correct approximate consensus algorithm needs to satisfy the following two conditions:
Validity: and ; and
Equivalently the Convergence condition can be stated as:
Towards facilitating the study of the number of phases needed for convergence and the corresponding message complexity, observe that convergence with respect to a specific must be considered. Therefore we will also use the following convergence notion.
-Convergence: , , .
In , we identified necessary and sufficient conditions on the underlying communication graphs for achieving crash-tolerant consensus in directed networks.
The theorem below requires the communication graph to satisfy Condition CCA
Approximate crash-tolerant consensus in asynchronous systems is feasible iff for any partition of , where and are both non-empty,
either or . (Condition CCA)
3 Limited Topology Knowledge and Relay Depth
In this section, we study how topology knowledge and the relay depth affect the tight conditions on the directed communication network. Particularly, we consider the case with -hop topology knowledge and relay depth for . Prior works (e.g., [38, 20, 17]) assumed that each node has -hop topology knowledge and relay depth . However, in large-scale networks, such an assumption may not be realistic. Therefore, we are interested in the algorithms that only require nodes to exchange a small amount of information within local neighborhood (e.g., [33, 30, 31]). One other benefit is that the algorithms do not require flooding  or all-to-all communication [20, 17] in each asynchronous phase.
We are interested in iterative -hop algorithms – nodes only have topology knowledge in their -hop neighborhoods, and propagate state values to nodes that are at most -hops away.We introduce a family of conditions, namely Condition -CCA for , which we prove necessary and sufficient for achieving asynchronous approximate consensus, through the use of iterative -hop algorithms. The results presented in this section also imply how affects the tight conditions on the directed networks – lower requires higher connectivity of the underlying communication network.
To the best of our knowledge, two prior papers [2, 36] examined a similar problem – synchronous Byzantine consensus. In , Su and Vaidya identified the condition under different relay depths. Alchieri et al.  studied the problem under unknown participants. The technique developed for asynchronous consensus in this section is significantly different.
Iterative -hop Algorithms The iterative algorithms considered here have relay depth and require each node to perform the following three steps in asynchronous phase :
1. Transmit: Transmit messages of the form to nodes that are reachable from node via at most hops away, where is the current state value. If node is an intermediate node on the route of some message, then node forwards that message as instructed by the source;
2. Receive: Receive messages from the nodes that can reach node via at most hops. Denote by the set of messages that node received at phase ; and
3. Update: Update state using a transition function , where is a part of the specification of the algorithm, and takes as input the set . i.e.,
Note that (i) no exchange of topology information takes place in this class of algorithms, and (ii) each node’s state only propagates within its -hop neighborhood. For a node , its -hop incoming neighbors are defined as the nodes which are connected to by a directed path in that has hops. The notion of -hop outgoing neighbors is defined similarly.
Technique The algorithms presented in this section are motivated by prior work [19, 36] including our own work . The algorithms are iterative and simple; thus, the proof structure shares some similarity with prior work [19, 38, 40].
Generally speaking, the proof proceeds as following: (i) nodes are divided into two disjoint sets, say and so that nodes have “closer” state values in each set; (ii) because each node receives an adequate set of messages, we show that under any delay and crash scenarios, at least one non-crashed node in either or will receive one message from the other set of nodes in each phase; and (iii) after enough phases, the value of all non-crashed nodes in either or will move “closer” to the values in the other set. Two key novelties are: identifying the “adequate set” of messages that needs to be received before updating local state in each asynchronous phase, and showing that with limited -hop propagation, some node is still able to receive messages from the other set (in step (ii) above).
To initiate the study, we first consider the one-hop case, where each node only knows its one-hop incoming and outgoing neighbors. The following notion is crucial for the characterization of graphs in which asynchronous approximate consensus is feasible with relay depth .
Given disjoint non-empty subsets of nodes and , we will use the notation if there exists a node in such that has at least distinct incoming neighbors in . When it is not true that , we will denote that fact by .
Condition -CCA, presented below proves to be necessary and sufficient for achieving asynchronous approximate consensus with relay depth .
[Condition -CCA] For any partition of , where and are both non-empty, either or .
The necessity of Condition -CCA is similar to the necessity proof of Condition CCA in  and is presented in Appendix B. For sufficiency, we present Algorithm LocWA (Local-Wait-Average) below, which is inspired by Algorithm WA , and utilizes only one-hop information. Recall that by definition, no message relay with depth greater than is allowed. In Algorithm LocWA, is the set of one-hop incoming neighbors of from which has received values during phase . Each node performs the averaging operation to update its state value when Condition 1-WAIT below holds for the first time in phase .
Condition 1-WAIT: The condition is satisfied at node , in phase , when , i.e., when has not received values from a set of at most incoming neighbors.
Algorithm LocWA for node
input at node
For phase :
*On entering phase :
Send message to all the outgoing neighbors
*When message is received for the first time:
// is a multiset
*When Condition 1-WAIT holds for the first time in phase :
To prove the correctness of LocWA, we will use the supplementary definitions below.
For disjoint sets , denotes the set of all the nodes in that each have at least incoming edges from nodes in . When , define . Formally, .
For non-empty disjoint sets and , set is said to propagate to set in steps, where , if there exist sequences of sets and (propagating sequences) such that
, , , , for , and
for , (i) ; (ii) ; and
Observe that and form a partition of , and for , . We say that set propagates to set if there is a propagating sequence for some steps as defined above. Note that the number of steps in the above definition is upper bounded by , since set must be of size at least for it to propagate to ; otherwise, .
Now, we present two key lemmas whose proofs are presented in Appendix C. In the discussion below, we assume that satisfies Condition -CCA.
For any partition of , where are both non-empty, either propagates to , or propagates to .
The lemma below states that the interval to which the states at all the fault-free nodes are confined shrinks after a finite number of phases of Algorithm LocWA. Recall that and denote the maximum and minimum states at the fault-free nodes at the end of the -th phase.
Suppose that at the end of the -th phase of Algorithm LocWA, can be partitioned into non-empty sets and such that (i) propagates to in steps, and (ii) the states of fault-free nodes in are confined to an interval of length . Then, with Algorithm LocWA,
If satisfies Condition -CCA, then Algorithm LocWA achieves both Validity and Convergence.
Proof Sketch: To prove the Convergence of LocWA, we show that given any , there exists such that . Consider -th phase, for some . If , then the algorithm has already converged; thus, we consider only the case where . In this case, we can partition into two subsets, and , such that, for each fault-free node , , and or each fault-free node , . (Full proof in  identifies how to partition the nodes.) By Lemma 3.1, we have that either propagates to set or propagates to . In both cases above, we have found two non-empty sets (or ) and (or ) partitioning and satisfy the hypothesis of Lemma 3.1, since propagates to and the states of all fault-free nodes in are confined to an interval of length . The theorem is then proven by using simple algebra and the fact that the interval to which the states of all the fault-free nodes are confined shrinks after a finite number of phases.
3.2 General Case
Now, consider the case when each node only knows its -hop neighbors and the relay depth is . In the following, we generalize the notions presented above to the -hop case. For node , denote by the set of ’s -hop incoming neighbors, For a set of nodes , let be the set of ’s one-hop incoming neighbors. Formally, . Next we define the relation for the -hop case.
 Given disjoint non-empty subsets of nodes and , we will say that holds if there exists a node in for which there exist at least node-disjoint paths of length at most from distinct nodes in to . More formally, if is the family of all sets of node-disjoint paths (with being their only common node) initiating in and ending in node , means that .
[Condition -CCA] For any partition of , where and are both non-empty, either or .
The necessity of Condition -CCA for achieving asynchronous approximate consensus through an iterative -hop algorithm holds analogously with the one-hop case, where a set of incoming neighbors of node has to be replaced with a set of distinct nodes that reach through disjoint paths. For sufficiency, we next present a generalization of Algorithm LocWA for the -hop case. There are two differences between Algorithms -LocWA and LocWA: (i) nodes transmit its state to all their -hop outgoing neighbors, and (ii) Algorithm -LocWA relies on the generalized version of Condition 1-WAIT, presented below.
Condition -WAIT: For , we denote with the set of nodes that have paths of length to node in . That is, the set of -hop incoming neighbors of that remain connected with even when all nodes in set crash. The condition is satisfied at node , in phase if there exists with such that .
Algorithm -LocWA for node
input at node
For phase :
*On entering phase :
Send message to nodes in , all -hop outgoing neighbors 333For brevity, we do not specify how the network routes the messages within the -hop neighborhood – this can be achieved by using local flooding through tagging a hop counter in each message.
* When message is received for the first time:
// is a multiset
* When Condition -WAIT holds for the first time in phase :
Correctness of Algorithm -LocWA Proving the correctness of -LocWA follows a similar reasoning of the correctness of LocWA. The key here is to identify Condition -CCA and Condition -WAIT so that the proof structure remains almost identical. To adapt the arguments to the general case, one should define the analogous definition based on the general notion.
For disjoint sets , denotes the set of all the nodes in that there exist least incoming disjoint paths of length at most from distinct nodes in to . When , define . Formally, in terminology of Definition 3.2:
The correctness proof of Algorithm -LocWA is similar to the proof of Theorem 3.1; remarks on the arguments’ adaptations are presented in the proof sketch of the following theorem.
Approximate crash-tolerant consensus in an asynchronous system using iterative -hop algorithms is feasible iff satisfies Condition -CCA.
Proof Sketch: Having defined the basic notion , Definition 3.1 of the notion propagates to is the same for the -hop case. Intuitively, if propagates to , information will be propagated gradually from to in steps; corruption of any faulty set of nodes will not be able to block propagation to a specific node because the definition of guarantees that will receive information from at least disjoint paths if it has not crashed. A difference with the original case is that for every of the steps needed to propagate from to , communication steps will be required in the worst case, since information may be propagated through paths of length . Lemma 3.1 is intuitively the same since it is based on the general propagation notion but value which is defined based on the number of incoming neighbors will now be defined on the number of -hop incoming neighbors, i.e., . The main correctness proof remains essentially the same since it repeatedly makes use of the abstract propagation notion between various sets, without focusing on how the values are propagated.
3.3 Condition Relation and Convergence Time Comparison
Next, we first compare the feasibility of approximate consensus for different values of by presenting a relation among the various -CCA conditions as well as their relation with Condition CCA from .
We first show that lower requires higher connectivity of the graph as stated below.
For values with , Condition -CCA implies Condition -CCA.
Let Condition -CCA hold and assume, without loss of generality that holds for a partition . This means that there exists a node in that has at least incoming disjoint paths of length at most initiating from distinct nodes in . Consequently, the same paths will consist ’s incoming disjoint paths of length at most , since , and thus, which means that -CCA holds. ∎
We next show that Condition CCA is equivalent to Condition -CCA. The proof illustrates how the locally defined Condition -CCA naturally coincides with the globally defined condition CCA in the extreme case.
Condition CCA is equivalent to Condition -CCA.
It is easy to see that Condition -CCA implies Condition CCA. If Condition CCA is violated in , then Condition -CCA does not hold either, since and have at most one-hop incoming neighbors.
Now, we show the other direction. Assume for the sake of contradiction that Condition CCA holds but Condition -CCA does not. Then, there exists a partition with such that and . Since Condition CCA holds, we have that either or . Now consider the case that and . This means that and . The case of and is symmetrical and the case of and can be proved by applying the argument below once for set and once for set .
Let be the node in with the maximum number of disjoint paths initiating from distinct nodes in (as implied by Definition 3.2). The fact implies that . Subsequently, implies that the set is non-empty (the maximal subset of which does not contain any -hop incoming neighbors of ). Let be the set of all the outgoing -hop neighbors of all nodes confined in the set . By definition of and , it holds that . We can now create a new partition by moving from to . For partition it holds that since and . Moreover, it holds that (i) , since and ; and (ii) since . The latter points imply that and , which yield a contradiction to the hypothesis that Condition CCA holds. This completes the proof. ∎
Convergence Time Comparison
We derive upper bounds on the number of asynchronous phases needed for -convergence of Algorithm -LocWA and its message complexity up to this -convergence point . These upper bounds are functions of values and which are naturally expected to affect the convergence time and message complexity. Moreover, since the bounds depend on , it provides a way to compare the convergence time and message complexity of Algorithms -LocWA for different values of . We will use the following Lemma to compute the number of phases needed for -convergence of Algorithm -LocWA.
For any phase of -LocWA, if , then there exists an integer , such that, for , the following holds,
The proof of the Lemma is given in the proof of Theorem 3.1 and is based on the generalization of Lemma 3.1 to the -hop case, which is obtained by replacing with ). Next we present the upper bound on the convergence time of -LocWA. The Theorem can be proved by repeatedly applying Lemma 3.3 until the value is less than . The full proof is in .
[Convergence-time complexity] The number of phases required by Algorithm -LocWA to -converge is .
The idea is to repeatedly apply Lemma 3.3 until the value is less than .
Observe that , else Condition -CCA is violated. Also, and ; hence, . We will denote by for succinctness. Assume wlog that , and define the following sequence of phase indices:
for , , where for any given is defined by Lemma 3.3.
By repeated application of Lemma 3.3, we have that for ,
so, -convergence will be achieved in phase , where . Since for every , we have that,
By the definition of the sequence and the bound of all we have that . Thus, the algorithm will -converge by phase the latest. ∎
Comparison of Algorithms -LocWA Convergence Observe that the above bound decreases, as the maximum number of -hop incoming neighbors increases, since . Since the maximum number of -hop incoming neighbors increases with we have that for , Algorithm -LocWA -converges faster than -LocWA by a factor implied by the bound.
Moreover, given the upper bound on phases for -convergence of Theorem 3.3 we can easily derive an upper bound on the message complexity of -LocWA. Namely,
[Message Complexity] The number of messages exchanged in an execution of Algorithm -LocWA until -convergence is
This holds because each phase of Algorithm -LocWA may require communication steps for -length paths to propagate values to a receiver. In the worst case, each node sends to all of its neighbors in every communication step. ∎
4 Topology Discovery and Unlimited Relay Depth
In this section, we consider the case with one-hop topology knowledge and relay depth . In other words, nodes initially only know their immediate incoming and outgoing neighbors, but nodes can flood the network and learn the topology. The study of this case is motivated by the observation that full topology knowledge at each node (e.g., [38, 20, 17]) requires a much higher deployment and configuration cost. We show that Condition CCA from  is necessary and sufficient for solving approximate consensus with one-hop neighborhood knowledge and relay depth in asynchronous directed networks. Compared to the iterative -hop algorithms in Section 3, the algorithms in this section are not restricted in the sense that nodes can propagate any messages to all the reachable nodes.
The necessity of Condition CCA is implied by our prior work . The algorithms presented below are again inspired by Algorithm WA from . The main contribution is to show how each node can learn “enough” topology information to solve approximate consensus – this technique may be of interests in other contexts as well. In the discussion below, we present an algorithm that works in any directed graph that satisfies Condition CCA.
Algorithm LWA The idea of Algorithm LWA (Learn-Wait-Average) is to piggyback the information of incoming neighbors when propagating state values. Then, each node will locally construct an estimated graph in every phase , and check whether Condition -WAIT holds in or not. Note that may not equal to , as node may not receive messages from some other nodes due to asynchrony or failures. We say Condition -WAIT holds in the local estimated graph if there exists a set , where , such that . Here, is the set of nodes that have paths to node in the subgraph induced by the nodes in for and .
Recall that denotes the set of ’s one-hop incoming neighbors. Given a set of nodes and node , we also use the notation to describe a directed graph consisting of nodes and set of directed edges from each node in to . Formally, , where .
Algorithm LWA for node
input at node
For phase :
* On entering phase :
Send message to all the outgoing neighbors
* When message is received for the first time:
// is a multiset
444, where and . Note that this is not a multiset, there is only one copy of each node or edge.
Send message to all the outgoing neighbors
* When Condition -WAIT holds on for the first time in phase :
// “Reset” the learned graph
Correctness of Algorithm LWA The key lemma to prove the correctness of Algorithm WA in  is to show that for any pair of nodes that have not crashed in phase , they must receive a state value from at least one common node. In Appendix D, we show that Algorithm LWA achieves the same property. Intuitively, if Condition -WAIT does not hold in the local estimated graph , then node knows it can learn more states in phase . Also, when Condition -WAIT is satisfied in , there exists a scenario that node cannot receive any more information; hence, it should not wait for any more message. This is why the Algorithm LWA allows each node to learn enough state values to achieve approximate consensus. We rely on this observation to prove the correctness in .
Undirected Graphs Algorithm LWA works on undirected graphs as well; however, the message size is large, since each message needs to include the information about one’s neighborhood. In Appendix E, we present an algorithm in which each node learns the topology in the first phase, and then executes an approximate consensus algorithm using the learned topology. The reasons that this trick works in undirected graphs are: (i) Condition CCA is equivalent to connectivity and in undirected graph; and (ii) for each node, there is at least one fault-free neighbor; hence, each node is able to learn the existence of every other node.
In this section, we discuss interesting implications of the conditions derived in this paper.
In undirected graphs, -connectivity and are both necessary and sufficient for solving approximate consensus in asynchronous networks with up to crash faults (implied by [20, 17]). It is easy to show that Condition CCA for tolerating faults is equivalent to these two conditions in undirected networks. However, this equivalence does not hold for general . For example, the network in Figure 0(a) has connectivity and four nodes, but does not satisfy Condition -CCA with (when ).
More interestingly, increasing the topology knowledge and relay depth by a small amount may increase the fault-tolerance tremendously. Consider the network in Figure 0(b). Condition -CCA does not hold for (when left clique, right clique, and ). On the other hand, Condition -CCA holds for . Intuitively, this holds because each pair of nodes are at most two hops away.
5.2 Real Time Speed Up of Algorithm k-LocWA
In asynchronous systems, the real time communication delay is arbitrary but finite. In a formal framework, it is common to assume that execution proceeds in rounds representing real time intervals, but the nodes do not have knowledge of the round index. To model the worst-case real time delay in the execution of a system we can use the notion of delay scenario which is a description of the delays, incurring on the communication through all edges of the network. The delivery delay of a message sent over a channel will be described by the number of rounds (amount of real time) that are needed for the delivery to be completed.
We first compare the real time performance of Algorithms -LocWA for different values of with respect to the real time delay. Specifically we show that there is a case where Algorithm LocWA terminates each phase in one round (one interval of real time), while it may take arbitrary number of rounds for Algorithm -LocWA to terminate phase 1. To formalize the comparison we will use the notion of -convergence time of Algorithms -LocWA.
Consider the graph of Figure 1(a), which is a ring network plus a directed edge . For , it is easy to verify that Condition 1-CCA holds, which implies that Conditions -CCA, for hold. Assume that the delivery of messages through directed edges is delayed by rounds while the communication in all the other edges is instant (1 round). For ease of presentation assume that no node crashes. Then, in an execution of Algorithm LocWA, it is clear that every node will finish phase in time because in each phase, it will receive a message from all of his neighbors except one, in one round and thus, Condition 1-WAIT will be satisfied.
On the other hand, in an execution of Algorithm -LocWA, node will only receive a message from in one round, since is a directed edge, and delay on edges and is . in this case, will not be able to decide before round , the first round where Condition 2-WAIT will be satisfied. Specifically, for the first phase it will hold that only after round since, if considers as a possible corruption set, it has to wait for a message from which will be propagated by and setting , it has to wait for a message from . Consequently the first time that node can decide is round where it will receive the rest of the values. For similar reasons, the same holds for nodes . Since may be an arbitrary integer, there is a delay scenario where the -convergence time for Algorithm 2-LocWA, is arbitrarily larger than the -convergence time of Algorithm LocWA.
Strong version of -LocWA with respect to real time In Example 5.2, observe that in the 2-hop knowledge case (execution of 2-LocWA), a node has all the information that it would have in the 1-hop knowledge case. Therefore, it can utilize the information to update its state value in a manner that 1-LocWA does, in order to guarantee faster convergence time. As a result, the modified algorithm would always be as fast, in terms of real time as 1-LocWA. Next, we modify the update condition of Algorithm -LocWA to capture this strengthened version with respect to real time.
Update Condition of Strong -LocWA In the strong version of Algorithm -LocWA, a node updates its value the first time that at least one of conditions -WAIT, for holds. Specifically we replace the update condition of Algorithm -LocWA with:
Update value when -WAIT for the first time in phase :
Considering this strong version of the algorithm family -LocWA, we can show that for and any , Algorithm -LocWA will -converge faster than Algorithm -LocWA. That is, for every delay scenario, the number of rounds in which -LocWA -converges is larger than the number of rounds in which -LocWA -converges. The proof is trivial, since the strengthened algorithm -LocWA will check all the update conditions for smaller values of , and the messages communicated in -LocWA are a superset of the messages communicated in -LocWA. Also observe that if -LocWA -converges then so does -LocWA. Thus we have the following Corollary.
For , if Strong -LocWA -converges in rounds then Strong -LocWA -converges in rounds with .
-  Ittai Abraham, Yonatan Amit, and Danny Dolev. Optimal resilience asynchronous approximate agreement. In OPODIS, pages 229–239, 2004.
-  EduardoA.P. Alchieri, AlyssonNeves Bessani, Joni Silva Fraga, and Fabíola Greve. Byzantine consensus with unknown participants. In TheodoreP. Baker, Alain Bui, and Sébastien Tixeuil, editors, Principles of Distributed Systems, volume 5401 of Lecture Notes in Computer Science, pages 22–40. Springer Berlin Heidelberg, 2008. URL: http://dx.doi.org/10.1007/978-3-540-92221-6_4, doi:10.1007/978-3-540-92221-6_4.
-  John Augustine, Gopal Pandurangan, and Peter Robinson. Fast byzantine agreement in dynamic networks. In Proceedings of the 2013 ACM Symposium on Principles of Distributed Computing, PODC ’13, pages 74–83, New York, NY, USA, 2013. ACM. URL: http://doi.acm.org/10.1145/2484239.2484275, doi:10.1145/2484239.2484275.
-  Piyush Bansal, Prasant Gopal, Anuj Gupta, Kannan Srinathan, and Pranav Kumar Vasishta. Byzantine agreement using partial authentication. In Proceedings of the 25th international conference on Distributed computing, DISC’11, pages 389–403, Berlin, Heidelberg, 2011. Springer-Verlag. URL: http://dl.acm.org/citation.cfm?id=2075029.2075079.
-  Dimitri P. Bertsekas and John N. Tsitsiklis. Parallel and Distributed Computation: Numerical Methods. Optimization and Neural Computation Series. Athena Scientific, 1997.
-  Martin Biely, Peter Robinson, and Ulrich Schmid. Easy impossibility proofs for k-set agreement in message passing systems. In Proceedings of the 15th International Conference on Principles of Distributed Systems, OPODIS’11, pages 299–312, Berlin, Heidelberg, 2011. Springer-Verlag. URL: http://dx.doi.org/10.1007/978-3-642-25873-2_21, doi:10.1007/978-3-642-25873-2_21.
-  Martin Biely, Peter Robinson, and Ulrich Schmid. Agreement in directed dynamic networks. In Guy Even and Magnús M. Halldórsson, editors, Structural Information and Communication Complexity, pages 73–84, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg.
-  Martin Biely, Peter Robinson, Ulrich Schmid, Manfred Schwarz, and Kyrill Winkler. Gracefully degrading consensus and k-set agreement in directed dynamic networks. CoRR, abs/1408.0620, 2014. URL: http://arxiv.org/abs/1408.0620.
-  Martin Biely, Peter Robinson, Ulrich Schmid, Manfred Schwarz, and Kyrill Winkler. Gracefully degrading consensus and k-set agreement in directed dynamic networks. In Ahmed Bouajjani and Hugues Fauconnier, editors, Networked Systems, pages 109–124, Cham, 2015. Springer International Publishing.
-  Martin Biely, Peter Robinson, Ulrich Schmid, Manfred Schwarz, and Kyrill Winkler. Gracefully degrading consensus and k-set agreement in directed dynamic networks. Theoretical Computer Science, 726:41 – 77, 2018. URL: http://www.sciencedirect.com/science/article/pii/S0304397518301166, doi:https://doi.org/10.1016/j.tcs.2018.02.019.
-  Bernadette Charron-Bost, Matthias Függer, and Thomas Nowak. Approximate consensus in highly dynamic networks. CoRR, abs/1408.0620, 2014. URL: http://arxiv.org/abs/1408.0620.
-  Bernadette Charron-Bost, Matthias Függer, and Thomas Nowak. Approximate consensus in highly dynamic networks: The role of averaging algorithms. In Proceedings, Part II, of the 42Nd International Colloquium on Automata, Languages, and Programming - Volume 9135, ICALP 2015, pages 528–539, New York, NY, USA, 2015. Springer-Verlag New York, Inc. URL: http://dx.doi.org/10.1007/978-3-662-47666-6_42, doi:10.1007/978-3-662-47666-6_42.
-  Ashish Choudhury, Gayathri Garimella, Arpita Patra, Divya Ravi, and Pratik Sarkar. Brief announcement: Crash-tolerant consensus in directed graph revisited. In 31st International Symposium on Distributed Computing, DISC 2017, October 16-20, 2017, Vienna, Austria, pages 46:1–46:4, 2017. URL: https://doi.org/10.4230/LIPIcs.DISC.2017.46, doi:10.4230/LIPIcs.DISC.2017.46.
-  Étienne Coulouma and Emmanuel Godard. A characterization of dynamic networks where consensus is solvable. In Thomas Moscibroda and Adele A. Rescigno, editors, Structural Information and Communication Complexity, pages 24–35, Cham, 2013. Springer International Publishing.
-  Yvo Desmedt and Yongge Wang. Perfectly secure message transmission revisited. In LarsR. Knudsen, editor, Advances in Cryptology – EUROCRYPT 2002, volume 2332 of Lecture Notes in Computer Science, pages 502–517. Springer Berlin Heidelberg, 2002. URL: http://dx.doi.org/10.1007/3-540-46035-7_33, doi:10.1007/3-540-46035-7_33.
-  S. M. Dibaji, H. Ishii, and R. Tempo. Resilient randomized quantized consensus. IEEE Transactions on Automatic Control, PP(99):1–1, 2017. doi:10.1109/TAC.2017.2771363.
-  Danny Dolev. The Byzantine generals strike again. Journal of Algorithms, 3(1), March 1982.
-  Danny Dolev, Cynthia Dwork, Orli Waarts, and Moti Yung. Perfectly secure message transmission. Journal of the Association for Computing Machinery (JACM), 40(1):17–14, 1993.
-  Danny Dolev, Nancy A. Lynch, Shlomit S. Pinter, Eugene W. Stark, and William E. Weihl. Reaching approximate agreement in the presence of faults. J. ACM, 33:499–516, May 1986. URL: http://doi.acm.org/10.1145/5925.5931, doi:http://doi.acm.org/10.1145/5925.5931.
-  Michael J. Fischer, Nancy A. Lynch, and Michael Merritt. Easy impossibility proofs for distributed consensus problems. In Proceedings of the fourth annual ACM symposium on Principles of distributed computing, PODC ’85, pages 59–70, New York, NY, USA, 1985. ACM. URL: http://doi.acm.org/10.1145/323596.323602, doi:http://doi.acm.org/10.1145/323596.323602.
-  Rachid Guerraoui and Bastian Pochon. The complexity of early deciding set agreement: How can topology help? Electronic Notes in Theoretical Computer Science, 230:71 – 78, 2009. Proceedings of the Workshops on Geometric and Topological Methods in Concurrency Theory (GETCO 2004+2005+2006). URL: http://www.sciencedirect.com/science/article/pii/S157106610900022X, doi:https://doi.org/10.1016/j.entcs.2009.02.017.
-  A. Jadbabaie, Jie Lin, and A.S. Morse. Coordination of groups of mobile autonomous agents using nearest neighbor rules. Automatic Control, IEEE Transactions on, 48(6):988 – 1001, june 2003. doi:10.1109/TAC.2003.812781.
-  Denis Jeanneau, Thibault Rieutord, Luciana Arantes, and Pierre Sens. Solving k-set agreement using failure detectors in unknown dynamic networks. IEEE Transactions on Parallel and Distributed Systems, 28(5):1484–1499, May 2017.
-  H. LeBlanc, H. Zhang, X. Koutsoukos, and S. Sundaram. Resilient asymptotic consensus in robust networks. IEEE Journal on Selected Areas in Communications: Special Issue on In-Network Computation, 31:766–781, April 2013.
-  Heath LeBlanc, Haotian Zhang, Shreyas Sundaram, and Xenofon Koutsoukos. Consensus of multi-agent networks in the presence of adversaries using only local information. HiCoNs, 2012.
-  Nancy A. Lynch. Distributed Algorithms. Morgan Kaufmann, 1996.
-  Alexandre Maurer, Sébastien Tixeuil, and Xavier Défago. Reliable communication in a dynamic network in the presence of Byzantine faults. CoRR, abs/1402.0121, 2014. URL: http://arxiv.org/abs/1402.0121.
-  Mikhail Nesterenko and Sébastien Tixeuil. Discovering network topology in the presence of byzantine faults. In Structural Information and Communication Complexity, pages 212–226, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg.
-  A. Pagourtzis, G. Panagiotakos, and D. Sakavalas. Reliable broadcast with respect to topology knowledge. In Proceedings of the 28th international conference on Distributed computing (DISC), 2014.
-  Aris Pagourtzis, Giorgos Panagiotakos, and Dimitris Sakavalas. Reliable broadcast with respect to topology knowledge. Distributed Computing, 30(2):87–102, 2017. URL: https://doi.org/10.1007/s00446-016-0279-6, doi:10.1007/s00446-016-0279-6.
-  Aris Pagourtzis, Giorgos Panagiotakos, and Dimitris Sakavalas. Reliable communication via semilattice properties of partial knowledge. In Fundamentals of Computation Theory - 21st International Symposium, FCT 2017, Bordeaux, France, September 11-13, 2017, Proceedings, pages 367–380, 2017. URL: https://doi.org/10.1007/978-3-662-55751-8_29, doi:10.1007/978-3-662-55751-8_29.
-  M. Pease, R. Shostak, and L. Lamport. Reaching agreement in the presence of faults. J. ACM, 27(2):228–234, April 1980. URL: http://doi.acm.org/10.1145/322186.322188, doi:10.1145/322186.322188.
-  David Peleg. Local majorities, coalitions and monopolies in graphs: a review. Theor. Comput. Sci., 282(2):231–257, 2002. URL: https://doi.org/10.1016/S0304-3975(01)00055-X, doi:10.1016/S0304-3975(01)00055-X.
-  Dimitris Sakavalas, Lewis Tseng, and Nitin H. Vaidya. Asynchronous crash-tolerant approximate consensus in directed graphs: Topology knowledge. CoRR, abs/1803.04513, 2018. URL: http://arxiv.org/abs/1803.04513, arXiv:1803.04513.
-  Bhavani Shankar, Prasant Gopal, Kannan Srinathan, and C. Pandu Rangan. Unconditionally reliable message transmission in directed networks. In Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms, SODA ’08, pages 1048–1055, Philadelphia, PA, USA, 2008. Society for Industrial and Applied Mathematics. URL: http://dl.acm.org/citation.cfm?id=1347082.1347197.
-  Lili Su and Nitin Vaidya. Reaching approximate Byzantine consensus with multi-hop communication. In Andrzej Pelc and Alexander A. Schwarzmann, editors, Stabilization, Safety, and Security of Distributed Systems, volume 9212 of Lecture Notes in Computer Science, pages 21–35. Springer International Publishing, 2015. URL: http://dx.doi.org/10.1007/978-3-319-21741-3_2, doi:10.1007/978-3-319-21741-3_2.
-  Lewis Tseng, Nitin Vaidya, and Vartika Bhandari. Broadcast using certified propagation algorithm in presence of Byzantine faults. Information Processing Letters, 115(4):512 – 514, 2015. URL: http://www.sciencedirect.com/science/article/pii/S0020019014002609, doi:http://dx.doi.org/10.1016/j.ipl.2014.11.010.
-  Lewis Tseng and Nitin H. Vaidya. Fault-tolerant consensus in directed graphs. In Proceedings of the 2015 ACM Symposium on Principles of Distributed Computing, PODC ’15, pages 451–460, New York, NY, USA, 2015. ACM. URL: http://doi.acm.org/10.1145/2767386.2767399, doi:10.1145/2767386.2767399.
-  Lewis Tseng and Nitin H. Vaidya. Iterative approximate Byzantine consensus under a generalized fault model. In In International Conference on Distributed Computing and Networking (ICDCN), January 2013.
-  Nitin H. Vaidya, Lewis Tseng, and Guanfeng Liang. Iterative approximate Byzantine consensus in arbitrary directed graphs. In Proceedings of the thirty-first annual ACM symposium on Principles of distributed computing, PODC ’12. ACM, 2012.
-  Kyrill Winkler, Manfred Schwarz, and Ulrich Schmid. Consensus in directed dynamic networks with short-lived stability. CoRR, abs/1602.05852, 2016. URL: http://arxiv.org/abs/1602.05852, arXiv:1602.05852.
-  H. Zhang and S. Sundaram. Robustness of complex networks with implications for consensus and contagion. In Proceedings of CDC 2012, the 51st IEEE Conference on Decision and Control, 2012.
-  H. Zhang and S. Sundaram. Robustness of distributed algorithms to locally bounded adversaries. In Proceedings of ACC 2012, the 31st American Control Conference, 2012.
Appendix A Additional Discussion of Related Work
Lamport, Shostak, and Pease addressed the Byzantine consensus problem in . Subsequent work [20, 17] characterized the necessary and sufficient conditions under which Byzantine consensus is solvable in undirected graphs. However, these conditions are not adequate to fully characterize the directed graphs in which Byzantine consensus is feasible.
Bansal et al.  identified tight conditions for achieving Byzantine consensus in undirected graphs using authentication. Bansal et al. discovered that all-pair reliable communication is not necessary to achieve consensus when using authentication. Our work differs from Bansal et al. in that our results apply in the absence of authentication or any other security primitives; also our results apply to directed graphs. Alchieri et al.  explored the problem of achieving exact consensus in unknown networks with Byzantine nodes, but the underlying communication graph is assumed to be fully-connected. In our work, each node has partial network knowledge, and we consider incomplete directed graphs.
a.2 Iterative Approximate Consensus
Many researchers in the decentralized control area, including Bertsekas and Tsitsiklis  and Jadbabaei, Lin and Morse , have explored approximate consensus in the absence of faults, using only near-neighbor communication in systems wherein the communication graph may be partially connected and time-varying. Our work considers the case when nodes may suffer crash failures.
Our prior work [40, 39, 36] has considered a restricted class of iterative algorithms for achieving approximate Byzantine consensus in directed graphs, where fault-free nodes must agree on values that are approximately equal to each other using iterative algorithms with limited memory (in particular, the state carried by the nodes across iterations must be in the convex hull of inputs of the fault-free nodes, which precludes mechanisms such as multi-hop forwarding of messages). The conditions developed in such prior work are not necessary when no such restrictions are imposed. Independently, LeBlanc et al. [25, 24], and Zhang and Sundaram [43, 42] have developed results for iterative algorithms for approximate consensus under a weaker fault model, where a faulty node must send identical messages to all the neighbors.
a.3 -set Consensus
-set consensus also received a lot of attentions in different graph assumptions. In complete graphs, Biely et al.  presented impossibility results of -set consensus in various message passing systems. Guerraoui and Pochon  studied early-deciding -set agreement using algebraic topology techniques. Our work studies directed incomplete graphs. In synchronous dynamic networks, Biely et al. [8, 9] considered -set consensus with fault-free nodes. Winkler et al.  solved exact consensus in synchronous dynamic networks with unreliable links. The main contribution in  was to identify the shortest period of stability that makes consensus feasible. In unknown and dynamic systems, Jeanneau et al.  relied on failure detectors to solve -set consensus. These works only studied synchronous systems, whereas we consider exact and approximate crash-tolerant consensus in asynchronous systems. Moreover, we do not assume the existence of failure detectors.
a.4 Reliable Communication and Broadcast
Several papers have also addressed communication between a single source-receiver pair. Dolev et al.  studied the problem of secure communication, which achieves both fault-tolerance and perfect secrecy between a single source-receiver pair in undirected graphs, in the presence of node and link failures. Desmedt and Wang considered the same problem in directed graphs . Shankar et al. 
investigated reliable communication between a source-receiver pair in directed graphs allowing for an arbitrarily small error probability in the presence of a Byzantine failures. Maurer et al. explored the problem in directed dynamic graphs. In our work, we do not consider secrecy, and address the consensus problem rather than the single source-receiver pair problem. Moreover, our work addresses both deterministically correct and randomized algorithms for consensus.
There has also been work [29, 37] on the problem of achieving reliable broadcast with a fault-free source in the presence of local Byzantine faults, which proved tight condition on the underlying graphs. In this paper, we consider consensus problem instead of reliable broadcast problem; furthermore, we allow any node to be faulty.
Appendix B Necessity of Condition -Cca
The necessity proof is similar to the necessity proof of Condition CCA in .
If graph does not satisfy Condition -CCA, then no iterative one-hop algorithm can achieve asynchronous approximate consensus in .
The proof is by contradiction. Suppose that there exists an iterative one-hop algorithm which achieves asynchronous approximate consensus in , and does not satisfy Condition -CCA. That is, there exists a node partition such that are non-empty, and .
Let denote the set of nodes that have outgoing links to nodes in , i.e., . Similarly define . Since and , we have that for every , and for every , .
Consider a scenario where (i) each node in has input 0; (ii) each node in has input ; (iii) nodes in (if non- empty) have arbitrary inputs in ; (iv) no node crashes; and (v) the message delay for communications channels from to and from to is arbitrarily large compared to all the other channels.
Consider nodes in . Since messages from the set take arbitrarily long to arrive at the nodes in , and for every , , from the perspective of node , its incoming neighbors in appear to have crashed. The latter yields from the fact that algorithm is one-hop, i.e., the case that for every , can not be excluded by the messages exchanged in and thus there is a case where all their neighbors in are crashed. Thus, nodes in must decide on their output without waiting to hear from the nodes in . Consequently, to satisfy the validity property, the output at each node in has to be 0, since 0 is the input of all the nodes in . Similarly, nodes in must decide their output without hearing from the nodes in ; they must choose output as , because the input at all the nodes in is . Thus, the -agreement property is violated, since the difference between outputs at fault-free nodes is not . This is a contradiction. ∎
Appendix C Sufficiency of Condition -Cca
We first prove a useful lemma.
Assume that satisfies Condition -CCA. Consider a partition of such that and are non-empty. If , then set propagates to set .
Since are non-empty, and , we have that holds, by setting in Condition -CCA.
Define and . Now, for a suitable , we will build propagating sequences and inductively.
Recall that and . Since , . Define and .
If , then , and we have found the propagating sequence already.
If , then define , and . Since , . Therefore, Condition -CCA implies that . That is, .
For increasing values of , given and , where , by following steps similar to the previous item, we can obtain and , such that either or .
In the above construction, is the smallest index such that . ∎
Proof of Lemma 3.1
Consider two cases:
: Then by Lemma C above, propagates to , completing the proof.
: In this case, consider two sub-cases:
propagates to : The proof in this case is complete.
does not propagate to : Recall that . Since does not propagate to , propagating sequences defined in Definition 3.1 do not exist in this case. More precisely, there must exist , and sets