1 Introduction
The faulttolerant consensus problem proposed by Lamport et al. [32] has been studied extensively under different pointtopoint network models, including complete networks (e.g., [32, 19, 1]) and undirected networks (e.g., [20, 17]). Recently, many works are exploring various consensus problems in directed networks, e.g., [11, 8, 9, 27, 13], including our own work [38, 40, 36]. More precisely, these works address the problem in incomplete directed networks, i.e., not every pair of nodes is connected by a channel, and the channels are not necessarily bidirectional. We will often use the terms graph and network interchangeably. In this work, we explore the crashtolerant approximate consensus problem in asynchronous incomplete networks under different restrictions on topology knowledge – where we assume that each node knows all its neighbors of at most hop distance – and relay depth – the maximum number of hops that information (or a message) can be propagated. These constraints are common in largescale networks to avoid memory overload and network congestion, e.g., neighbor table and Time to live (TTL) (or hop limit) in the Internet Protocol. We consider both undirected and directed graphs in this paper.
Motivation Prior results [38, 13] showed that exact crashtolerant consensus is solvable in synchronous networks with only onehop knowledge and relay depth , i.e., each node only needs to know its immediate incoming and outgoing neighbors, and no message needs to be relayed (or forwarded). Such a local algorithm is of interest in practice due to low deployment cost and low message complexity in each round. In asynchronous undirected networks, there exists a simple floodingbased algorithm adapted from [20, 17] that achieves approximate consensus with up to crash faults if the network satisfies nodeconnectivity^{1}^{1}1For brevity, we will simply use the term “connectivity” in the presentation below. and , where is the number of nodes. However, these two conditions are not sufficient for an iterative algorithm with onehop knowledge and relay depth , in which each node maintains a state and exchanges state values with only onehop neighbors in each iteration.
Consider Figure 0(a), which is a ring network of four nodes. There is no iterative algroithm with onehop knowledge and relay depth under one crash fault. The adversary can divide the nodes into disjoint sets and such that the communication delay across sets is so large that thinks has crashed, and thinks has crashed, and similarly for the pair and . As a result, no exchange of state values is possible across the sets in the execution; hence, consensus is not possible (a more precise discussion in Section 3). On the other hand, suppose each node has twohop knowledge, i.e., a complete topology knowledge in this network, and relay depth . Then knows that it will be able to receive state values from at least two of the other nodes since the node connectivity is , and up to one node may fail. Following this observation, it is easy to design a floodingbased algorithm in the ring network based on [20, 17]. This example shows that both topology knowledge and relay depth affect the feasibility of asynchronous approximate consensus.
Interestingly, increasing connectivity alone does not make iterative algorithm feasible. In Section 5.1, we show that no faulttolerant approximate consensus algorithm with onehop topology and relay depth exists in the network in Figure 0(b), which has two sparselyconnected cliques of size and connectivity . Motivated by these observations, this work addresses the following question in asynchronous systems:
What is a tight condition on the underlying communication graphs for achieving approximate consensus if each node has only a hop topology knowledge and relay depth ?
Approximate Consensus We focus on the asynchronous approximate consensus problem. The system consists of nodes, of which at most nodes may crash. Each node is given an input, and after a finite amount of time, each faultfree node should produce an output, which satisfies validity and agreement conditions (formally defined later). Intuitively, the state at faultfree nodes must be in the range of all the inputs, and are guaranteed to be within of each other for some after a sufficiently large number of rounds.^{2}^{2}2In the literature, it is also called asymptotic consensus. Here, we use the term “approximate consensus” following the work [19, 38]
In [38], we presented Condition CCA (definition in Section 2) and showed that it is necessary and sufficient on the underlying directed graphs for achieving approximate consensus in asynchronous systems [38]. The approximate consensus algorithms in prior work [38, 20, 17] are based on flooding (i.e., relay depth ) and assume that each node has hop topology knowledge. However, such an algorithm in not practical in a largescale network, since nodes’ local memory may not be large enough to store the entire network, floodingbased algorithms (e.g., [38, 20, 17]) incur prohibitively high message overhead for each phase, and complete topology knowledge may require a high deployment and configuration cost. Therefore, we explore algorithms that only require “local” knowledge and limited message relay.
Contributions We identify tight conditions on the graphs under different assumptions on topology knowledge and relay depth. Particularly, we have the following results:

[noitemsep,topsep=0pt]

Limited Topology Knowledge and Relay Depth (Section 3): We consider the case with hop topology knowledge and relay depth . The family of algorithms that captures these constrains are iterative hop algorithms – nodes only have topology knowledge of their hop neighborhoods, and propagate state values to nodes that are at most hops away. Note that no other information is relayed. For iterative hop algorithms, we derive a family of tight conditions, namely Condition CCA for , for solving approximate consensus in directed networks. To prove the tightness of the conditions, we propose a family of iterative algorithms called LocWA and show how the convergence time and the message complexity of those algorithms is affected by , providing the respective upper bounds.

Topology Discovery and Unlimited Relay Depth (Section 4): We consider the case with onehop topology knowledge and relay depth . In other words, nodes initially only know their immediate incoming and outgoing neighbors, but nodes can flood the network, learn (some part of) the topology, and eventually solve consensus based on the learned topology. We show that Condition CCA from [38] is also sufficient in this case. Since we assume only onehop knowledge, our result implies that Condition CCA is tight for any hop topology knowledge. One contribution that may be of independent interest is a topology discovery mechanism to learn and “estimate” the topology in asynchronous directed networks with crash faults. Such a discovery mechanism will be useful for selfstabilization and reconfiguration of a largescale system.
In Section 5, we discuss faulttolerance implications of the derived conditions and Condition CCA. We also discuss how to speed up our algorithms in terms of real time delay.
Related Work There is a large body of work on faulttolerant consensus. Here, we discuss related works exploring consensus in different assumptions on graphs. Fisher et al. [20] and Dolev [17] characterized necessary and sufficient conditions under which Byzantine consensus is solvable in undirected graphs. In synchronous systems, CharronBost et al. [11, 12] solved approximate crashtolerant consensus in dynamic directed networks using local averaging algorithms, and in the asynchronous setting, CharronBost et al. [11, 12] addressed approximate consensus with crash faults in complete graphs which are necessarily undirected. We solve the problem in incomplete directed graphs in asynchronous systems. Moreover, in [11, 12], nodes are constrained to only have the onehop topology knowledge. We study different types of algorithms, including the ones that allow nodes to learn the topology (i.e., we allow topology discovery).
There were also works studying limited topology knowledge. Su and Vaidya [36] identified the condition for solving synchronous Byzantine consensus using a variation of hop algorithms. Alchieri et al. [2] studied the synchronous Byzantine problem under unknown participants. We consider asynchronous systems in this work. Nesterenko and Tixeuil [28] studied the topology discovery problem in the presence of Byzantine faults in undirected networks, whereas we present a solution that works in directed networks with crash faults.
Extensive prior works studied graph properties for other similar problems in the presence of Byzantine failures, such as (i) Byzantine approximate consensus in directed graphs using “local averaging” algorithms wherein nodes only have onehop neighborhood knowledge (e.g., [40, 39, 36, 24, 43, 42, 16]), (ii) Byzantine consensus with unknown participants [2], (iii) Byzantine consensus with authentication in undirected networks [4]. These papers only consider synchronous systems, and our algorithms and analysis are significantly different from those developed for Byzantine algorithms, and (iv) consensus problems in synchronous dynamic networks where the adversary can change the network topology. In this line of work, impossibility results for Consensus and Set Agreement are given in [7, 10] and sufficiency is guaranteed by requiring a period of stability, during which certain nodes are strongly connected; the first tight condition for the feasibility of consensus and broadcast is presented in [14]. Additionally, in [3], byzantine corruptions and a dynamic node set is assumed and a round randomized algorithm is presented. Our work is different from all these works because of the assumption of asynchronous systems and limited topology information. Please refer to our technical report [34] for further discussion on these works.
2 Preliminary
Before presenting the results, we introduce our systems model, some terminology, and our prior results from [38] to facilitate the discussion.
System Model The pointtopoint messagepassing network is static, and it is represented by a simple directed graph , where is the set of nodes, and is the set of directed edges between the nodes in . The communication links are reliable. We assume that , since the consensus problem for is trivial. Node can transmit messages to another node directly if directed edge is in . Each node can send messages to itself as well; however, for convenience, we exclude selfloops from set . We will use the terms edge and link interchangeably.
Up to nodes may suffer crash failures in an execution. A node that suffers a crash failure simply stops taking steps (i.e., failstop model). We consider the asynchronous messagepassing communication, in which a message may be delayed arbitrarily but eventually delivered if the receiver node is faultfree. We assume that the adversary has both the control of crashing nodes and delaying messages at any point of time during the execution.
Terminology Upper case letters are used to name sets. Lower case italic letters are used to name nodes. All paths used in our discussion are directed paths.
Node is said to be an incoming neighbor of node if . Let be the set of incoming neighbors of node , i.e., . Define as the set of outgoing neighbors of node , i.e., .
For set , node is said to be an incoming neighbor of set if , and there exists such that . Given subsets of nodes and , set is said to have incoming neighbors in set if contains distinct incoming neighbors of . Given disjoint nonempty subsets of nodes and , if has at least distinct incoming neighbors in . When it is not true that , we will denote that fact by .
Approximate Consensus For the approximate consensus problem (e.g., [19, 26, 38]), it is usually assumed that each node maintains a state with denoting the state of node at the end of phase (or iteration) . The initial state of node , , is equal to the initial input provided to node . At the start of phase , the state of node is .
Let and be the maximum and the minimum state at nodes that have not crashed by the end of phase . Then, a correct approximate consensus algorithm needs to satisfy the following two conditions:

[noitemsep,topsep=0pt]

Validity: and ; and

Convergence: .
Equivalently the Convergence condition can be stated as:
.
Towards facilitating the study of the number of phases needed for convergence and the corresponding message complexity, observe that convergence with respect to a specific must be considered. Therefore we will also use the following convergence notion.

[noitemsep,topsep=0pt]

Convergence: , , .
Prior Result
In [38], we identified necessary and sufficient conditions on the underlying communication graphs for achieving crashtolerant consensus in directed networks.
The theorem below requires the communication graph to satisfy Condition CCA
(CrashConsensusAsynchronous).
[from [38]]
Approximate crashtolerant consensus in asynchronous systems is feasible iff for any partition of , where and are both nonempty,
either or . (Condition CCA)
3 Limited Topology Knowledge and Relay Depth
In this section, we study how topology knowledge and the relay depth affect the tight conditions on the directed communication network. Particularly, we consider the case with hop topology knowledge and relay depth for . Prior works (e.g., [38, 20, 17]) assumed that each node has hop topology knowledge and relay depth . However, in largescale networks, such an assumption may not be realistic. Therefore, we are interested in the algorithms that only require nodes to exchange a small amount of information within local neighborhood (e.g., [33, 30, 31]). One other benefit is that the algorithms do not require flooding [38] or alltoall communication [20, 17] in each asynchronous phase.
We are interested in iterative hop algorithms – nodes only have topology knowledge in their hop neighborhoods, and propagate state values to nodes that are at most hops away.We introduce a family of conditions, namely Condition CCA for , which we prove necessary and sufficient for achieving asynchronous approximate consensus, through the use of iterative hop algorithms. The results presented in this section also imply how affects the tight conditions on the directed networks – lower requires higher connectivity of the underlying communication network.
To the best of our knowledge, two prior papers [2, 36] examined a similar problem – synchronous Byzantine consensus. In [36], Su and Vaidya identified the condition under different relay depths. Alchieri et al. [2] studied the problem under unknown participants. The technique developed for asynchronous consensus in this section is significantly different.
Iterative hop Algorithms The iterative algorithms considered here have relay depth and require each node to perform the following three steps in asynchronous phase :
1. Transmit: Transmit messages of the form to nodes that are reachable from node via at most hops away, where is the current state value. If node is an intermediate node on the route of some message, then node forwards that message as instructed by the source;
2. Receive: Receive messages from the nodes that can reach node via at most hops. Denote by the set of messages that node received at phase ; and
3. Update: Update state using a transition function , where is a part of the specification of the algorithm, and takes as input the set . i.e.,
Note that (i) no exchange of topology information takes place in this class of algorithms, and (ii) each node’s state only propagates within its hop neighborhood. For a node , its hop incoming neighbors are defined as the nodes which are connected to by a directed path in that has hops. The notion of hop outgoing neighbors is defined similarly.
Technique The algorithms presented in this section are motivated by prior work [19, 36] including our own work [38]. The algorithms are iterative and simple; thus, the proof structure shares some similarity with prior work [19, 38, 40].
Generally speaking, the proof proceeds as following: (i) nodes are divided into two disjoint sets, say and so that nodes have “closer” state values in each set; (ii) because each node receives an adequate set of messages, we show that under any delay and crash scenarios, at least one noncrashed node in either or will receive one message from the other set of nodes in each phase; and (iii) after enough phases, the value of all noncrashed nodes in either or will move “closer” to the values in the other set. Two key novelties are: identifying the “adequate set” of messages that needs to be received before updating local state in each asynchronous phase, and showing that with limited hop propagation, some node is still able to receive messages from the other set (in step (ii) above).
3.1 Case
To initiate the study, we first consider the onehop case, where each node only knows its onehop incoming and outgoing neighbors. The following notion is crucial for the characterization of graphs in which asynchronous approximate consensus is feasible with relay depth .
[]Given disjoint nonempty subsets of nodes and , we will use the notation if there exists a node in such that has at least distinct incoming neighbors in . When it is not true that , we will denote that fact by .
Condition CCA, presented below proves to be necessary and sufficient for achieving asynchronous approximate consensus with relay depth .
[Condition CCA] For any partition of , where and are both nonempty, either or .
The necessity of Condition CCA is similar to the necessity proof of Condition CCA in [38] and is presented in Appendix B. For sufficiency, we present Algorithm LocWA (LocalWaitAverage) below, which is inspired by Algorithm WA [38], and utilizes only onehop information. Recall that by definition, no message relay with depth greater than is allowed. In Algorithm LocWA, is the set of onehop incoming neighbors of from which has received values during phase . Each node performs the averaging operation to update its state value when Condition 1WAIT below holds for the first time in phase .
Condition 1WAIT: The condition is satisfied at node , in phase , when , i.e., when has not received values from a set of at most incoming neighbors.
Algorithm LocWA for node
input at node
For phase :
*On entering phase :
Send message to all the outgoing neighbors
*When message is received for the first time:
// is a multiset
*When Condition 1WAIT holds for the first time in phase :
(1) 
Enter phase
To prove the correctness of LocWA, we will use the supplementary definitions below.
For disjoint sets , denotes the set of all the nodes in that each have at least incoming edges from nodes in . When , define . Formally, .
For nonempty disjoint sets and , set is said to propagate to set in steps, where , if there exist sequences of sets and (propagating sequences) such that

[noitemsep,topsep=0pt]

, , , , for , and

for , (i) ; (ii) ; and
(iii) .
Observe that and form a partition of , and for , . We say that set propagates to set if there is a propagating sequence for some steps as defined above. Note that the number of steps in the above definition is upper bounded by , since set must be of size at least for it to propagate to ; otherwise, .
Now, we present two key lemmas whose proofs are presented in Appendix C. In the discussion below, we assume that satisfies Condition CCA.
For any partition of , where are both nonempty, either propagates to , or propagates to .
The lemma below states that the interval to which the states at all the faultfree nodes are confined shrinks after a finite number of phases of Algorithm LocWA. Recall that and denote the maximum and minimum states at the faultfree nodes at the end of the th phase.
Suppose that at the end of the th phase of Algorithm LocWA, can be partitioned into nonempty sets and such that (i) propagates to in steps, and (ii) the states of faultfree nodes in are confined to an interval of length . Then, with Algorithm LocWA,
(2) 
Using lemma 3.1 and simple algebra, we can prove the following Theorem. For the sake of space, we present only a proof sketch. The complete proof is deferred to Appendix C.
If satisfies Condition CCA, then Algorithm LocWA achieves both Validity and Convergence.
Proof Sketch: To prove the Convergence of LocWA, we show that given any , there exists such that . Consider th phase, for some . If , then the algorithm has already converged; thus, we consider only the case where . In this case, we can partition into two subsets, and , such that, for each faultfree node , , and or each faultfree node , . (Full proof in [34] identifies how to partition the nodes.) By Lemma 3.1, we have that either propagates to set or propagates to . In both cases above, we have found two nonempty sets (or ) and (or ) partitioning and satisfy the hypothesis of Lemma 3.1, since propagates to and the states of all faultfree nodes in are confined to an interval of length . The theorem is then proven by using simple algebra and the fact that the interval to which the states of all the faultfree nodes are confined shrinks after a finite number of phases.
3.2 General Case
Now, consider the case when each node only knows its hop neighbors and the relay depth is . In the following, we generalize the notions presented above to the hop case. For node , denote by the set of ’s hop incoming neighbors, For a set of nodes , let be the set of ’s onehop incoming neighbors. Formally, . Next we define the relation for the hop case.
[] Given disjoint nonempty subsets of nodes and , we will say that holds if there exists a node in for which there exist at least nodedisjoint paths of length at most from distinct nodes in to . More formally, if is the family of all sets of nodedisjoint paths (with being their only common node) initiating in and ending in node , means that .
[Condition CCA] For any partition of , where and are both nonempty, either or .
The necessity of Condition CCA for achieving asynchronous approximate consensus through an iterative hop algorithm holds analogously with the onehop case, where a set of incoming neighbors of node has to be replaced with a set of distinct nodes that reach through disjoint paths. For sufficiency, we next present a generalization of Algorithm LocWA for the hop case. There are two differences between Algorithms LocWA and LocWA: (i) nodes transmit its state to all their hop outgoing neighbors, and (ii) Algorithm LocWA relies on the generalized version of Condition 1WAIT, presented below.
Condition WAIT: For , we denote with the set of nodes that have paths of length to node in . That is, the set of hop incoming neighbors of that remain connected with even when all nodes in set crash. The condition is satisfied at node , in phase if there exists with such that .
Algorithm LocWA for node
input at node
For phase :
*On entering phase :
Send message to nodes in , all hop outgoing neighbors ^{3}^{3}3For brevity, we do not specify how the network routes the messages within the hop neighborhood – this can be achieved by using local flooding through tagging a hop counter in each message.
* When message is received for the first time:
// is a multiset
* When Condition WAIT holds for the first time in phase :
Enter phase
Correctness of Algorithm LocWA Proving the correctness of LocWA follows a similar reasoning of the correctness of LocWA. The key here is to identify Condition CCA and Condition WAIT so that the proof structure remains almost identical. To adapt the arguments to the general case, one should define the analogous definition based on the general notion.
For disjoint sets , denotes the set of all the nodes in that there exist least incoming disjoint paths of length at most from distinct nodes in to . When , define . Formally, in terminology of Definition 3.2:
The correctness proof of Algorithm LocWA is similar to the proof of Theorem 3.1; remarks on the arguments’ adaptations are presented in the proof sketch of the following theorem.
Approximate crashtolerant consensus in an asynchronous system using iterative hop algorithms is feasible iff satisfies Condition CCA.
Proof Sketch: Having defined the basic notion , Definition 3.1 of the notion propagates to is the same for the hop case. Intuitively, if propagates to , information will be propagated gradually from to in steps; corruption of any faulty set of nodes will not be able to block propagation to a specific node because the definition of guarantees that will receive information from at least disjoint paths if it has not crashed. A difference with the original case is that for every of the steps needed to propagate from to , communication steps will be required in the worst case, since information may be propagated through paths of length . Lemma 3.1 is intuitively the same since it is based on the general propagation notion but value which is defined based on the number of incoming neighbors will now be defined on the number of hop incoming neighbors, i.e., . The main correctness proof remains essentially the same since it repeatedly makes use of the abstract propagation notion between various sets, without focusing on how the values are propagated.
3.3 Condition Relation and Convergence Time Comparison
Next, we first compare the feasibility of approximate consensus for different values of by presenting a relation among the various CCA conditions as well as their relation with Condition CCA from [38].
Condition Relation
We first show that lower requires higher connectivity of the graph as stated below.
For values with , Condition CCA implies Condition CCA.
Proof.
Let Condition CCA hold and assume, without loss of generality that holds for a partition . This means that there exists a node in that has at least incoming disjoint paths of length at most initiating from distinct nodes in . Consequently, the same paths will consist ’s incoming disjoint paths of length at most , since , and thus, which means that CCA holds. ∎
We next show that Condition CCA is equivalent to Condition CCA. The proof illustrates how the locally defined Condition CCA naturally coincides with the globally defined condition CCA in the extreme case.
Condition CCA is equivalent to Condition CCA.
Proof.
It is easy to see that Condition CCA implies Condition CCA. If Condition CCA is violated in , then Condition CCA does not hold either, since and have at most onehop incoming neighbors.
Now, we show the other direction. Assume for the sake of contradiction that Condition CCA holds but Condition CCA does not. Then, there exists a partition with such that and . Since Condition CCA holds, we have that either or . Now consider the case that and . This means that and . The case of and is symmetrical and the case of and can be proved by applying the argument below once for set and once for set .
Let be the node in with the maximum number of disjoint paths initiating from distinct nodes in (as implied by Definition 3.2). The fact implies that . Subsequently, implies that the set is nonempty (the maximal subset of which does not contain any hop incoming neighbors of ). Let be the set of all the outgoing hop neighbors of all nodes confined in the set . By definition of and , it holds that . We can now create a new partition by moving from to . For partition it holds that since and . Moreover, it holds that (i) , since and ; and (ii) since . The latter points imply that and , which yield a contradiction to the hypothesis that Condition CCA holds. This completes the proof. ∎
Convergence Time Comparison
We derive upper bounds on the number of asynchronous phases needed for convergence of Algorithm LocWA and its message complexity up to this convergence point . These upper bounds are functions of values and which are naturally expected to affect the convergence time and message complexity. Moreover, since the bounds depend on , it provides a way to compare the convergence time and message complexity of Algorithms LocWA for different values of . We will use the following Lemma to compute the number of phases needed for convergence of Algorithm LocWA.
For any phase of LocWA, if , then there exists an integer , such that, for , the following holds,
The proof of the Lemma is given in the proof of Theorem 3.1 and is based on the generalization of Lemma 3.1 to the hop case, which is obtained by replacing with ). Next we present the upper bound on the convergence time of LocWA. The Theorem can be proved by repeatedly applying Lemma 3.3 until the value is less than . The full proof is in [34].
[Convergencetime complexity] The number of phases required by Algorithm LocWA to converge is .
Proof.
The idea is to repeatedly apply Lemma 3.3 until the value is less than .
Observe that , else Condition CCA is violated. Also, and ; hence, . We will denote by for succinctness. Assume wlog that , and define the following sequence of phase indices:

[noitemsep,topsep=0pt]

,

for , , where for any given is defined by Lemma 3.3.
By repeated application of Lemma 3.3, we have that for ,
so, convergence will be achieved in phase , where . Since for every , we have that,
By the definition of the sequence and the bound of all we have that . Thus, the algorithm will converge by phase the latest. ∎
Comparison of Algorithms LocWA Convergence Observe that the above bound decreases, as the maximum number of hop incoming neighbors increases, since . Since the maximum number of hop incoming neighbors increases with we have that for , Algorithm LocWA converges faster than LocWA by a factor implied by the bound.
Moreover, given the upper bound on phases for convergence of Theorem 3.3 we can easily derive an upper bound on the message complexity of LocWA. Namely,
[Message Complexity] The number of messages exchanged in an execution of Algorithm LocWA until convergence is
Proof.
This holds because each phase of Algorithm LocWA may require communication steps for length paths to propagate values to a receiver. In the worst case, each node sends to all of its neighbors in every communication step. ∎
4 Topology Discovery and Unlimited Relay Depth
In this section, we consider the case with onehop topology knowledge and relay depth . In other words, nodes initially only know their immediate incoming and outgoing neighbors, but nodes can flood the network and learn the topology. The study of this case is motivated by the observation that full topology knowledge at each node (e.g., [38, 20, 17]) requires a much higher deployment and configuration cost. We show that Condition CCA from [38] is necessary and sufficient for solving approximate consensus with onehop neighborhood knowledge and relay depth in asynchronous directed networks. Compared to the iterative hop algorithms in Section 3, the algorithms in this section are not restricted in the sense that nodes can propagate any messages to all the reachable nodes.
The necessity of Condition CCA is implied by our prior work [38]. The algorithms presented below are again inspired by Algorithm WA from [38]. The main contribution is to show how each node can learn “enough” topology information to solve approximate consensus – this technique may be of interests in other contexts as well. In the discussion below, we present an algorithm that works in any directed graph that satisfies Condition CCA.
Algorithm LWA The idea of Algorithm LWA (LearnWaitAverage) is to piggyback the information of incoming neighbors when propagating state values. Then, each node will locally construct an estimated graph in every phase , and check whether Condition WAIT holds in or not. Note that may not equal to , as node may not receive messages from some other nodes due to asynchrony or failures. We say Condition WAIT holds in the local estimated graph if there exists a set , where , such that . Here, is the set of nodes that have paths to node in the subgraph induced by the nodes in for and .
Recall that denotes the set of ’s onehop incoming neighbors. Given a set of nodes and node , we also use the notation to describe a directed graph consisting of nodes and set of directed edges from each node in to . Formally, , where .
Algorithm LWA for node
input at node
For phase :
* On entering phase :
Send message to all the outgoing neighbors
* When message is received for the first time:
// is a multiset
^{4}^{4}4, where and . Note that this is not a multiset, there is only one copy of each node or edge.
Send message to all the outgoing neighbors
* When Condition WAIT holds on for the first time in phase :
// “Reset” the learned graph
Enter phase
Correctness of Algorithm LWA The key lemma to prove the correctness of Algorithm WA in [38] is to show that for any pair of nodes that have not crashed in phase , they must receive a state value from at least one common node. In Appendix D, we show that Algorithm LWA achieves the same property. Intuitively, if Condition WAIT does not hold in the local estimated graph , then node knows it can learn more states in phase . Also, when Condition WAIT is satisfied in , there exists a scenario that node cannot receive any more information; hence, it should not wait for any more message. This is why the Algorithm LWA allows each node to learn enough state values to achieve approximate consensus. We rely on this observation to prove the correctness in [34].
Undirected Graphs Algorithm LWA works on undirected graphs as well; however, the message size is large, since each message needs to include the information about one’s neighborhood. In Appendix E, we present an algorithm in which each node learns the topology in the first phase, and then executes an approximate consensus algorithm using the learned topology. The reasons that this trick works in undirected graphs are: (i) Condition CCA is equivalent to connectivity and in undirected graph; and (ii) for each node, there is at least one faultfree neighbor; hence, each node is able to learn the existence of every other node.
5 Discussion
In this section, we discuss interesting implications of the conditions derived in this paper.
5.1 Faulttolerance
In undirected graphs, connectivity and are both necessary and sufficient for solving approximate consensus in asynchronous networks with up to crash faults (implied by [20, 17]). It is easy to show that Condition CCA for tolerating faults is equivalent to these two conditions in undirected networks. However, this equivalence does not hold for general . For example, the network in Figure 0(a) has connectivity and four nodes, but does not satisfy Condition CCA with (when ).
More interestingly, increasing the topology knowledge and relay depth by a small amount may increase the faulttolerance tremendously. Consider the network in Figure 0(b). Condition CCA does not hold for (when left clique, right clique, and ). On the other hand, Condition CCA holds for . Intuitively, this holds because each pair of nodes are at most two hops away.
5.2 Real Time Speed Up of Algorithm kLocWA
In asynchronous systems, the real time communication delay is arbitrary but finite. In a formal framework, it is common to assume that execution proceeds in rounds representing real time intervals, but the nodes do not have knowledge of the round index. To model the worstcase real time delay in the execution of a system we can use the notion of delay scenario which is a description of the delays, incurring on the communication through all edges of the network. The delivery delay of a message sent over a channel will be described by the number of rounds (amount of real time) that are needed for the delivery to be completed.
We first compare the real time performance of Algorithms LocWA for different values of with respect to the real time delay. Specifically we show that there is a case where Algorithm LocWA terminates each phase in one round (one interval of real time), while it may take arbitrary number of rounds for Algorithm LocWA to terminate phase 1. To formalize the comparison we will use the notion of convergence time of Algorithms LocWA.
Consider the graph of Figure 1(a), which is a ring network plus a directed edge . For , it is easy to verify that Condition 1CCA holds, which implies that Conditions CCA, for hold. Assume that the delivery of messages through directed edges is delayed by rounds while the communication in all the other edges is instant (1 round). For ease of presentation assume that no node crashes. Then, in an execution of Algorithm LocWA, it is clear that every node will finish phase in time because in each phase, it will receive a message from all of his neighbors except one, in one round and thus, Condition 1WAIT will be satisfied.
On the other hand, in an execution of Algorithm LocWA, node will only receive a message from in one round, since is a directed edge, and delay on edges and is . in this case, will not be able to decide before round , the first round where Condition 2WAIT will be satisfied. Specifically, for the first phase it will hold that only after round since, if considers as a possible corruption set, it has to wait for a message from which will be propagated by and setting , it has to wait for a message from . Consequently the first time that node can decide is round where it will receive the rest of the values. For similar reasons, the same holds for nodes . Since may be an arbitrary integer, there is a delay scenario where the convergence time for Algorithm 2LocWA, is arbitrarily larger than the convergence time of Algorithm LocWA.
Strong version of LocWA with respect to real time In Example 5.2, observe that in the 2hop knowledge case (execution of 2LocWA), a node has all the information that it would have in the 1hop knowledge case. Therefore, it can utilize the information to update its state value in a manner that 1LocWA does, in order to guarantee faster convergence time. As a result, the modified algorithm would always be as fast, in terms of real time as 1LocWA. Next, we modify the update condition of Algorithm LocWA to capture this strengthened version with respect to real time.
Update Condition of Strong LocWA In the strong version of Algorithm LocWA, a node updates its value the first time that at least one of conditions WAIT, for holds. Specifically we replace the update condition of Algorithm LocWA with:

[noitemsep,topsep=0pt]

Update value when WAIT for the first time in phase :
Considering this strong version of the algorithm family LocWA, we can show that for and any , Algorithm LocWA will converge faster than Algorithm LocWA. That is, for every delay scenario, the number of rounds in which LocWA converges is larger than the number of rounds in which LocWA converges. The proof is trivial, since the strengthened algorithm LocWA will check all the update conditions for smaller values of , and the messages communicated in LocWA are a superset of the messages communicated in LocWA. Also observe that if LocWA converges then so does LocWA. Thus we have the following Corollary.
For , if Strong LocWA converges in rounds then Strong LocWA converges in rounds with .
References
 [1] Ittai Abraham, Yonatan Amit, and Danny Dolev. Optimal resilience asynchronous approximate agreement. In OPODIS, pages 229–239, 2004.
 [2] EduardoA.P. Alchieri, AlyssonNeves Bessani, Joni Silva Fraga, and Fabíola Greve. Byzantine consensus with unknown participants. In TheodoreP. Baker, Alain Bui, and Sébastien Tixeuil, editors, Principles of Distributed Systems, volume 5401 of Lecture Notes in Computer Science, pages 22–40. Springer Berlin Heidelberg, 2008. URL: http://dx.doi.org/10.1007/9783540922216_4, doi:10.1007/9783540922216_4.
 [3] John Augustine, Gopal Pandurangan, and Peter Robinson. Fast byzantine agreement in dynamic networks. In Proceedings of the 2013 ACM Symposium on Principles of Distributed Computing, PODC ’13, pages 74–83, New York, NY, USA, 2013. ACM. URL: http://doi.acm.org/10.1145/2484239.2484275, doi:10.1145/2484239.2484275.
 [4] Piyush Bansal, Prasant Gopal, Anuj Gupta, Kannan Srinathan, and Pranav Kumar Vasishta. Byzantine agreement using partial authentication. In Proceedings of the 25th international conference on Distributed computing, DISC’11, pages 389–403, Berlin, Heidelberg, 2011. SpringerVerlag. URL: http://dl.acm.org/citation.cfm?id=2075029.2075079.
 [5] Dimitri P. Bertsekas and John N. Tsitsiklis. Parallel and Distributed Computation: Numerical Methods. Optimization and Neural Computation Series. Athena Scientific, 1997.
 [6] Martin Biely, Peter Robinson, and Ulrich Schmid. Easy impossibility proofs for kset agreement in message passing systems. In Proceedings of the 15th International Conference on Principles of Distributed Systems, OPODIS’11, pages 299–312, Berlin, Heidelberg, 2011. SpringerVerlag. URL: http://dx.doi.org/10.1007/9783642258732_21, doi:10.1007/9783642258732_21.
 [7] Martin Biely, Peter Robinson, and Ulrich Schmid. Agreement in directed dynamic networks. In Guy Even and Magnús M. Halldórsson, editors, Structural Information and Communication Complexity, pages 73–84, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg.
 [8] Martin Biely, Peter Robinson, Ulrich Schmid, Manfred Schwarz, and Kyrill Winkler. Gracefully degrading consensus and kset agreement in directed dynamic networks. CoRR, abs/1408.0620, 2014. URL: http://arxiv.org/abs/1408.0620.
 [9] Martin Biely, Peter Robinson, Ulrich Schmid, Manfred Schwarz, and Kyrill Winkler. Gracefully degrading consensus and kset agreement in directed dynamic networks. In Ahmed Bouajjani and Hugues Fauconnier, editors, Networked Systems, pages 109–124, Cham, 2015. Springer International Publishing.
 [10] Martin Biely, Peter Robinson, Ulrich Schmid, Manfred Schwarz, and Kyrill Winkler. Gracefully degrading consensus and kset agreement in directed dynamic networks. Theoretical Computer Science, 726:41 – 77, 2018. URL: http://www.sciencedirect.com/science/article/pii/S0304397518301166, doi:https://doi.org/10.1016/j.tcs.2018.02.019.
 [11] Bernadette CharronBost, Matthias Függer, and Thomas Nowak. Approximate consensus in highly dynamic networks. CoRR, abs/1408.0620, 2014. URL: http://arxiv.org/abs/1408.0620.
 [12] Bernadette CharronBost, Matthias Függer, and Thomas Nowak. Approximate consensus in highly dynamic networks: The role of averaging algorithms. In Proceedings, Part II, of the 42Nd International Colloquium on Automata, Languages, and Programming  Volume 9135, ICALP 2015, pages 528–539, New York, NY, USA, 2015. SpringerVerlag New York, Inc. URL: http://dx.doi.org/10.1007/9783662476666_42, doi:10.1007/9783662476666_42.
 [13] Ashish Choudhury, Gayathri Garimella, Arpita Patra, Divya Ravi, and Pratik Sarkar. Brief announcement: Crashtolerant consensus in directed graph revisited. In 31st International Symposium on Distributed Computing, DISC 2017, October 1620, 2017, Vienna, Austria, pages 46:1–46:4, 2017. URL: https://doi.org/10.4230/LIPIcs.DISC.2017.46, doi:10.4230/LIPIcs.DISC.2017.46.
 [14] Étienne Coulouma and Emmanuel Godard. A characterization of dynamic networks where consensus is solvable. In Thomas Moscibroda and Adele A. Rescigno, editors, Structural Information and Communication Complexity, pages 24–35, Cham, 2013. Springer International Publishing.
 [15] Yvo Desmedt and Yongge Wang. Perfectly secure message transmission revisited. In LarsR. Knudsen, editor, Advances in Cryptology – EUROCRYPT 2002, volume 2332 of Lecture Notes in Computer Science, pages 502–517. Springer Berlin Heidelberg, 2002. URL: http://dx.doi.org/10.1007/3540460357_33, doi:10.1007/3540460357_33.
 [16] S. M. Dibaji, H. Ishii, and R. Tempo. Resilient randomized quantized consensus. IEEE Transactions on Automatic Control, PP(99):1–1, 2017. doi:10.1109/TAC.2017.2771363.
 [17] Danny Dolev. The Byzantine generals strike again. Journal of Algorithms, 3(1), March 1982.
 [18] Danny Dolev, Cynthia Dwork, Orli Waarts, and Moti Yung. Perfectly secure message transmission. Journal of the Association for Computing Machinery (JACM), 40(1):17–14, 1993.
 [19] Danny Dolev, Nancy A. Lynch, Shlomit S. Pinter, Eugene W. Stark, and William E. Weihl. Reaching approximate agreement in the presence of faults. J. ACM, 33:499–516, May 1986. URL: http://doi.acm.org/10.1145/5925.5931, doi:http://doi.acm.org/10.1145/5925.5931.
 [20] Michael J. Fischer, Nancy A. Lynch, and Michael Merritt. Easy impossibility proofs for distributed consensus problems. In Proceedings of the fourth annual ACM symposium on Principles of distributed computing, PODC ’85, pages 59–70, New York, NY, USA, 1985. ACM. URL: http://doi.acm.org/10.1145/323596.323602, doi:http://doi.acm.org/10.1145/323596.323602.
 [21] Rachid Guerraoui and Bastian Pochon. The complexity of early deciding set agreement: How can topology help? Electronic Notes in Theoretical Computer Science, 230:71 – 78, 2009. Proceedings of the Workshops on Geometric and Topological Methods in Concurrency Theory (GETCO 2004+2005+2006). URL: http://www.sciencedirect.com/science/article/pii/S157106610900022X, doi:https://doi.org/10.1016/j.entcs.2009.02.017.
 [22] A. Jadbabaie, Jie Lin, and A.S. Morse. Coordination of groups of mobile autonomous agents using nearest neighbor rules. Automatic Control, IEEE Transactions on, 48(6):988 – 1001, june 2003. doi:10.1109/TAC.2003.812781.
 [23] Denis Jeanneau, Thibault Rieutord, Luciana Arantes, and Pierre Sens. Solving kset agreement using failure detectors in unknown dynamic networks. IEEE Transactions on Parallel and Distributed Systems, 28(5):1484–1499, May 2017.
 [24] H. LeBlanc, H. Zhang, X. Koutsoukos, and S. Sundaram. Resilient asymptotic consensus in robust networks. IEEE Journal on Selected Areas in Communications: Special Issue on InNetwork Computation, 31:766–781, April 2013.
 [25] Heath LeBlanc, Haotian Zhang, Shreyas Sundaram, and Xenofon Koutsoukos. Consensus of multiagent networks in the presence of adversaries using only local information. HiCoNs, 2012.
 [26] Nancy A. Lynch. Distributed Algorithms. Morgan Kaufmann, 1996.
 [27] Alexandre Maurer, Sébastien Tixeuil, and Xavier Défago. Reliable communication in a dynamic network in the presence of Byzantine faults. CoRR, abs/1402.0121, 2014. URL: http://arxiv.org/abs/1402.0121.
 [28] Mikhail Nesterenko and Sébastien Tixeuil. Discovering network topology in the presence of byzantine faults. In Structural Information and Communication Complexity, pages 212–226, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg.
 [29] A. Pagourtzis, G. Panagiotakos, and D. Sakavalas. Reliable broadcast with respect to topology knowledge. In Proceedings of the 28th international conference on Distributed computing (DISC), 2014.
 [30] Aris Pagourtzis, Giorgos Panagiotakos, and Dimitris Sakavalas. Reliable broadcast with respect to topology knowledge. Distributed Computing, 30(2):87–102, 2017. URL: https://doi.org/10.1007/s0044601602796, doi:10.1007/s0044601602796.
 [31] Aris Pagourtzis, Giorgos Panagiotakos, and Dimitris Sakavalas. Reliable communication via semilattice properties of partial knowledge. In Fundamentals of Computation Theory  21st International Symposium, FCT 2017, Bordeaux, France, September 1113, 2017, Proceedings, pages 367–380, 2017. URL: https://doi.org/10.1007/9783662557518_29, doi:10.1007/9783662557518_29.
 [32] M. Pease, R. Shostak, and L. Lamport. Reaching agreement in the presence of faults. J. ACM, 27(2):228–234, April 1980. URL: http://doi.acm.org/10.1145/322186.322188, doi:10.1145/322186.322188.
 [33] David Peleg. Local majorities, coalitions and monopolies in graphs: a review. Theor. Comput. Sci., 282(2):231–257, 2002. URL: https://doi.org/10.1016/S03043975(01)00055X, doi:10.1016/S03043975(01)00055X.
 [34] Dimitris Sakavalas, Lewis Tseng, and Nitin H. Vaidya. Asynchronous crashtolerant approximate consensus in directed graphs: Topology knowledge. CoRR, abs/1803.04513, 2018. URL: http://arxiv.org/abs/1803.04513, arXiv:1803.04513.
 [35] Bhavani Shankar, Prasant Gopal, Kannan Srinathan, and C. Pandu Rangan. Unconditionally reliable message transmission in directed networks. In Proceedings of the nineteenth annual ACMSIAM symposium on Discrete algorithms, SODA ’08, pages 1048–1055, Philadelphia, PA, USA, 2008. Society for Industrial and Applied Mathematics. URL: http://dl.acm.org/citation.cfm?id=1347082.1347197.
 [36] Lili Su and Nitin Vaidya. Reaching approximate Byzantine consensus with multihop communication. In Andrzej Pelc and Alexander A. Schwarzmann, editors, Stabilization, Safety, and Security of Distributed Systems, volume 9212 of Lecture Notes in Computer Science, pages 21–35. Springer International Publishing, 2015. URL: http://dx.doi.org/10.1007/9783319217413_2, doi:10.1007/9783319217413_2.
 [37] Lewis Tseng, Nitin Vaidya, and Vartika Bhandari. Broadcast using certified propagation algorithm in presence of Byzantine faults. Information Processing Letters, 115(4):512 – 514, 2015. URL: http://www.sciencedirect.com/science/article/pii/S0020019014002609, doi:http://dx.doi.org/10.1016/j.ipl.2014.11.010.
 [38] Lewis Tseng and Nitin H. Vaidya. Faulttolerant consensus in directed graphs. In Proceedings of the 2015 ACM Symposium on Principles of Distributed Computing, PODC ’15, pages 451–460, New York, NY, USA, 2015. ACM. URL: http://doi.acm.org/10.1145/2767386.2767399, doi:10.1145/2767386.2767399.
 [39] Lewis Tseng and Nitin H. Vaidya. Iterative approximate Byzantine consensus under a generalized fault model. In In International Conference on Distributed Computing and Networking (ICDCN), January 2013.
 [40] Nitin H. Vaidya, Lewis Tseng, and Guanfeng Liang. Iterative approximate Byzantine consensus in arbitrary directed graphs. In Proceedings of the thirtyfirst annual ACM symposium on Principles of distributed computing, PODC ’12. ACM, 2012.
 [41] Kyrill Winkler, Manfred Schwarz, and Ulrich Schmid. Consensus in directed dynamic networks with shortlived stability. CoRR, abs/1602.05852, 2016. URL: http://arxiv.org/abs/1602.05852, arXiv:1602.05852.
 [42] H. Zhang and S. Sundaram. Robustness of complex networks with implications for consensus and contagion. In Proceedings of CDC 2012, the 51st IEEE Conference on Decision and Control, 2012.
 [43] H. Zhang and S. Sundaram. Robustness of distributed algorithms to locally bounded adversaries. In Proceedings of ACC 2012, the 31st American Control Conference, 2012.
Appendix A Additional Discussion of Related Work
a.1 Consensus
Lamport, Shostak, and Pease addressed the Byzantine consensus problem in [32]. Subsequent work [20, 17] characterized the necessary and sufficient conditions under which Byzantine consensus is solvable in undirected graphs. However, these conditions are not adequate to fully characterize the directed graphs in which Byzantine consensus is feasible.
Bansal et al. [4] identified tight conditions for achieving Byzantine consensus in undirected graphs using authentication. Bansal et al. discovered that allpair reliable communication is not necessary to achieve consensus when using authentication. Our work differs from Bansal et al. in that our results apply in the absence of authentication or any other security primitives; also our results apply to directed graphs. Alchieri et al. [2] explored the problem of achieving exact consensus in unknown networks with Byzantine nodes, but the underlying communication graph is assumed to be fullyconnected. In our work, each node has partial network knowledge, and we consider incomplete directed graphs.
a.2 Iterative Approximate Consensus
Many researchers in the decentralized control area, including Bertsekas and Tsitsiklis [5] and Jadbabaei, Lin and Morse [22], have explored approximate consensus in the absence of faults, using only nearneighbor communication in systems wherein the communication graph may be partially connected and timevarying. Our work considers the case when nodes may suffer crash failures.
Our prior work [40, 39, 36] has considered a restricted class of iterative algorithms for achieving approximate Byzantine consensus in directed graphs, where faultfree nodes must agree on values that are approximately equal to each other using iterative algorithms with limited memory (in particular, the state carried by the nodes across iterations must be in the convex hull of inputs of the faultfree nodes, which precludes mechanisms such as multihop forwarding of messages). The conditions developed in such prior work are not necessary when no such restrictions are imposed. Independently, LeBlanc et al. [25, 24], and Zhang and Sundaram [43, 42] have developed results for iterative algorithms for approximate consensus under a weaker fault model, where a faulty node must send identical messages to all the neighbors.
a.3 set Consensus
set consensus also received a lot of attentions in different graph assumptions. In complete graphs, Biely et al. [6] presented impossibility results of set consensus in various message passing systems. Guerraoui and Pochon [21] studied earlydeciding set agreement using algebraic topology techniques. Our work studies directed incomplete graphs. In synchronous dynamic networks, Biely et al. [8, 9] considered set consensus with faultfree nodes. Winkler et al. [41] solved exact consensus in synchronous dynamic networks with unreliable links. The main contribution in [41] was to identify the shortest period of stability that makes consensus feasible. In unknown and dynamic systems, Jeanneau et al. [23] relied on failure detectors to solve set consensus. These works only studied synchronous systems, whereas we consider exact and approximate crashtolerant consensus in asynchronous systems. Moreover, we do not assume the existence of failure detectors.
a.4 Reliable Communication and Broadcast
Several papers have also addressed communication between a single sourcereceiver pair. Dolev et al. [18] studied the problem of secure communication, which achieves both faulttolerance and perfect secrecy between a single sourcereceiver pair in undirected graphs, in the presence of node and link failures. Desmedt and Wang considered the same problem in directed graphs [15]. Shankar et al. [35]
investigated reliable communication between a sourcereceiver pair in directed graphs allowing for an arbitrarily small error probability in the presence of a Byzantine failures. Maurer et al. explored the problem in directed dynamic graphs
[27]. In our work, we do not consider secrecy, and address the consensus problem rather than the single sourcereceiver pair problem. Moreover, our work addresses both deterministically correct and randomized algorithms for consensus.There has also been work [29, 37] on the problem of achieving reliable broadcast with a faultfree source in the presence of local Byzantine faults, which proved tight condition on the underlying graphs. In this paper, we consider consensus problem instead of reliable broadcast problem; furthermore, we allow any node to be faulty.
Appendix B Necessity of Condition Cca
The necessity proof is similar to the necessity proof of Condition CCA in [38].
If graph does not satisfy Condition CCA, then no iterative onehop algorithm can achieve asynchronous approximate consensus in .
Proof.
The proof is by contradiction. Suppose that there exists an iterative onehop algorithm which achieves asynchronous approximate consensus in , and does not satisfy Condition CCA. That is, there exists a node partition such that are nonempty, and .
Let denote the set of nodes that have outgoing links to nodes in , i.e., . Similarly define . Since and , we have that for every , and for every , .
Consider a scenario where (i) each node in has input 0; (ii) each node in has input ; (iii) nodes in (if non empty) have arbitrary inputs in ; (iv) no node crashes; and (v) the message delay for communications channels from to and from to is arbitrarily large compared to all the other channels.
Consider nodes in . Since messages from the set take arbitrarily long to arrive at the nodes in , and for every , , from the perspective of node , its incoming neighbors in appear to have crashed. The latter yields from the fact that algorithm is onehop, i.e., the case that for every , can not be excluded by the messages exchanged in and thus there is a case where all their neighbors in are crashed. Thus, nodes in must decide on their output without waiting to hear from the nodes in . Consequently, to satisfy the validity property, the output at each node in has to be 0, since 0 is the input of all the nodes in . Similarly, nodes in must decide their output without hearing from the nodes in ; they must choose output as , because the input at all the nodes in is . Thus, the agreement property is violated, since the difference between outputs at faultfree nodes is not . This is a contradiction. ∎
Appendix C Sufficiency of Condition Cca
We first prove a useful lemma.
Assume that satisfies Condition CCA. Consider a partition of such that and are nonempty. If , then set propagates to set .
Proof.
Since are nonempty, and , we have that holds, by setting in Condition CCA.
Define and . Now, for a suitable , we will build propagating sequences and inductively.

Recall that and . Since , . Define and .
If , then , and we have found the propagating sequence already.
If , then define , and . Since , . Therefore, Condition CCA implies that . That is, .

For increasing values of , given and , where , by following steps similar to the previous item, we can obtain and , such that either or .
In the above construction, is the smallest index such that . ∎
Proof of Lemma 3.1
Proof.
Consider two cases:

: Then by Lemma C above, propagates to , completing the proof.

: In this case, consider two subcases:

propagates to : The proof in this case is complete.

does not propagate to : Recall that . Since does not propagate to , propagating sequences defined in Definition 3.1 do not exist in this case. More precisely, there must exist , and sets and
