Self-healing Routing and Other Problems in Compact Memory

03/08/2018 ∙ by Armando Castañeda, et al. ∙ Loughborough University 0

This paper looks at the question of designing distributed algorithms for the setting of compact memory i.e. sublinear (in n - the number of nodes) bits for connected networks of arbitrary topologies. The nodes in our networks may have much lower internal memory (say, O(poly n)) as compared to the number of their possible neighbours. This implies that a node needs to run the algorithm and do computations without being even able to store the states or IDs of all its neighbours. Towards this end, we introduce the Compact Memory Passing model (CMP)- the standard message passing model at a finer granularity where a node can interleave reads and writes with internal computations. This is required for meaningful computations due to the low memory requirement and is somewhat akin to a distributed network with nodes executing streaming algorithms. We believe this model captures features of large networks of small memory devices (e.g. the Internet of Things (IOT)) and initiates research into theoretical foundations in this area. Compact Routing Messages in Self-Healing Trees (Distributed Computing 2017) introduced the compact self-healing routing algorithm CompactFTZ assuming `regular' memory for preprocessing and posed the problem of its compact preprocessing. We solve this problem and, hence, introduce the first fully compact self-healing routing algorithm. In the process, we also give independent fully compact algorithms for the Forgiving Tree [PODC 2008] Thorup-Zwick's tree based compact routing [SPAA 2001], and fundamental problems of leader election, tree constructions and traversals (BFS, DFS, spanning trees and convergecast). Our nodes have only O(^2 n) local memory but the preprocessing can be accomplished using O( n) bits sized messages (as in the CONGEST model). We also give a faster solution for O( n) bits sized messages.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Large networks of low memory devices such as the Internet of Things (IOT) are expected to introduce billions of very weak devices that will need to solve distributed computing problems to function effectively. In this paper, we attempt to formalise the development of distributed algorithms for such a scenario of large networks of low memory devices. We decouple the internal working memory of a node from the memory used by ports for ingress (receiving) and egress (transmitting) (e.g. the ingress queue (Rx) and the egress queue (Tx)) which cannot be used for computation. Thus, in an arbitrary network of nodes, nodes with smaller internal memory ( bits) may need to support a larger number of connections (). To enable this, we introduce the () model, the standard synchronous message-passing model at a finer granularity where each process can interleave reads from and writes to its ports with internal computation using its low ( bits) memory. We give the first algorithms for several classic problems in the model, such as leader election (by flooding), DFS and BFS spanning tree construction and traversals and convergecast. We build on these to develop the first fully compact distributed routing, self-healing and self-healing compact routing algorithms. We notice that with careful construction, some (but not all) solutions incurred almost no additional time/messages overhead compared to regular memory message passing.

There has been intense interest in designing efficient routing schemes for distributed networks [53, 50, 3, 13, 55, 19, 30, 1, 7] with compact routing trading stretch (factor increase in routing length) for memory used. In essence, the challenge is to use bits memory per node overcoming the need for large routing tables or/and packet headers. Our present setting has a similar ambition - what all can be done if nodes have limited working memory even if they may have large neighbourhoods? In fact, we define local memory as compact if it is bits and by extension, an algorithm as compact if it works in compact memory.

We see two major directions for extending the previously mentioned works. Firstly, a routing scheme consists of two parts - a pre-processing algorithm (scheme construction) and a routing protocol [27]. The routing results mentioned above assume sequential centralized pre-processing. Since routing is inherently a distributed networks problem, it makes sense to have the scheme construction distributed too, and this has led to a recent spurt in designing efficient preprocessing algorithms for compact routing schemes [44, 29, 45, 20, 21]. These algorithms do not have explicit constraints on internal working memory, therefore, in essence, they choose to conserve space (for other purposes). Our interpretation is stricter and we develop a pre-processing scheme (for a routing scheme from [55]) assuming that nodes do not even have any excess space and therefore, have, to develop the whole solution in compact memory itself. Moreover, our solutions are deterministic unlike the solutions listed above, though they tackle a broader range of routing schemes than we do.

Secondly, deterministic routing schemes, in the preprocessing phase, rely on discovery and efficient distributed ‘encoding’ of the network’s topology to reduce the memory requirement (a routing scheme on an arbitrary network with no prior topology or direction knowledge would essentially imply large memory requirements). This makes them sensitive to any topology change and, hence, it is challenging to design fault tolerant compact routing schemes. There has been some work in this direction e.g. in the dynamic tree model  [38, 36] or with additional capacity and rerouting in anticipation of failures [10, 9, 12, 17, 25, 28]. Self-healing is a responsive fault-tolerace paradigm seeking minimal anticipatory additional capacity and has led to a series of work [57, 52, 34, 58, 47, 48, 33, 54] in the recent past for maintaining topological properties (connectivity, degrees, diameter/stretch, expansion etc.). Algorithms were also proposed to ‘self-heal’ computations e.g. [51]. Combining the above motivations,  [6] introduced a fault-tolerant compact routing solution in the (deletion only) self-healing model where an omniscient adversary attacks by removing nodes and the affected nodes distributively respond by adding back connections. However, as in previous routing schemes, ’s pre-processing assumed large (not compact) memory. This paper addresses that important problem developing a compact pre-processing deterministic algorithm for . We also develop a compact pre-processing deterministic algorithm for (a compact version of ForgivingTree [34]). This leads to a fully compact (i.e. completely distributed and in compact memory) routing scheme, a fully compact self-healing routing scheme and a fully compact self-healing algorithm.

Our model:

In brief (detailed model in Section 2), our network is an arbitrary connected graph over nodes. Each node has a number of uniquely identified communication ports. Nodes have bits of working memory (We need only for our algorithms). However, a node may have neighbours. Note that a node has enough ports for unicast communication with neighbours but port memory is specialised for communication and cannot be used for computation or as storage space. Also note that the size of the messages are upper bounded by the memory size (in fact, we only need bits sized messages as in the CONGEST model [49]). In standard synchronous message passing, in every round, a node reads the messages of all its neighbours, does some internal computation and then outputs messages. Our nodes cannot copy all the messages to the working space, hence, in our model, nodes interleave processing with reads and writes as long as each port is read from and written to at most once in a round. Hence, a round may look like where , and stand for processing, reading and writing (subscripted by port numbers). As in regular message passing, all outgoing messages are sent to the neighbours to be delivered by the end of the present round. The order of reads may be determined by the node ((self) deterministic reads) or by an adversary (adversarial reads) and the order of writes by the node itself. We call this the () model.

The model can also be viewed as a network of machines with each node locally executing a kind of streaming algorithm where the input (of at most items, where is the number of ports) is ‘streamed’ with each item seen at most once and the node computing/outputing results with the partial information. Our self-healing algorithms are in the bounded memory deletion only self-healing model  [6, 57] where nodes have compact memory and in every round, an omniscient adversary removes a node but the nearby nodes react by adding connections to self-heal the network. However, their preprocessing requires only the model. The detailed model is given in Section 2.

General solution strategy and an example:

A general solution strategy in the model is to view the algorithms as addressing two distinct (but related) challenges. The first is that the processing after each read is constrained to be a function of the (memory limited) node state and the previous read (as in a streaming or online fashion) and it results in an output (possibly NULL) which may be stored or output as an outgoing message. We will refer to such a function as a local compact function. The second part is to design, what we call, a compact protocol that solves the distributed problem of passing messages to efficiently solve the chosen problem in our model. We discuss local compact functions a bit further in our context. A simple compact function may be the function which simply outputs the maximum value seen so far. A more challenging problem is to design a function that outputs the neighbourhood of a node in a labelled binary tree. Consider the following question: Give a compact function that given as input the number of leaves and any leaf node , returns the neighbourhood of for a balanced binary search tree of leaves with internal nodes repeated as leaves and arranged in ascending order from left to right. Note that the function should work for a tree of any size without generating the whole tree (due to limited memory). Figure 1 illustrates the question (further background is in Section 3) – the solution to a similar question (solution in Section 5.1) forms the crux of our fully compact algorithms. It’s also a question of interest whether this approach could be generalised to construct a generic function that when input a compact description of a structure (in our case, already encoded in the function) generates relevant compact substructures on demand when queried.

Figure 1: Compact function to query labeled BST trees/half-full trees (Section 5.1): On the left is such a tree with 5 leaves. will return the second box (having the sized subtrees of and )
Our results:

Our results follow. We introduce the model hoping it will provide a formal basis for designing algorithms for devices with low working memory in large scale networks. As mentioned, we introduce a generic solution strategy (compact protocols with compact functions) and in Section 5.1 (cf. Lemma 5.6), we give a compact function of independent interest that compactly queries a labelled binary tree. We give some deterministic algorithms in the model as summarised in Table  1. We do not provide any non-obvious lower bounds but for some algorithms it is easy to see that the solutions are optimal and suffer no overhead due to the lower memory as compared to regular message passing (denoted with a ‘*’ in Table 1). In general, it is easier to construct (by broadcast) and maintain spanning trees using a constant number of parent and sibling pointers, and effectively do bottom up computation, but unicast communication of parent with children may suffer due to the parent’s lack of memory (with parent possibly resorting to broadcast). We solve preprocessing for the compact routing scheme ([6], based on [55]), compact self-healing scheme ([6], based on [34]) and as summarised in Theorem 1.1 leading to fully compact routing, self-healing and self-healing routing solutions (in conjunction with [6]) as Corollaries 1.2 to  1.4. Note that combining with the tree cover results from [56], our algorithms could be extended to obtain low stretch routing for general graphs. In Section 5.1, we also give a compact function for outputting the neighbourhood of a node of a labelled binary tree (too big to fit into the memory) which may be of independent interest (Lemma 5.6).

[tbh] Algorithm Internal Time # messages In Paper memory (# rounds) Leader Election by flooding Section 1.1 BFS Spanning Tree Construction Section 4.1 (regular) Convergecast Section 4.2 Weight labelling with convergecast Section 4.2 Compact DFS relabelling of a tree Section 4.3 further Compact DFS walks Section 4.3 ‘Light’ Path by BFS construction Section 4.3 Wills (w/ half-full tree labelling) setup …with deterministic reads Section 5.2 …with adversarial reads Section 5.3 preprocessing Th.1.1.1 preproc. with adversarial reads Th.1.1.2 preproc. with deterministic reads Th.1.1.2 preproc. with adversarial reads Th.1.1.4 preproc. with deterministic reads Th.1.1.3

  • : Upper bound on number of ports of a node

  • No additional overhead in comparison with regular message passing.

Summary of the algorithms in the CMP model in this paper. Results apply to both deterministic and adversarial reads unless otherwise indicated.

Theorem 1.1.

In the model, given a connected synchronous network of nodes and edges with bits local memory:

  1. Algorithm preprocessing (deterministic / adversarial reads) can be done using bits size messages in time and messages, or using bits size messages in time and messages.

  2. Algorithm preprocessing can be done using bits size messages in time and messages for deterministic reads, or time and messages for adversarial reads.

  3. Algorithm preprocessing can be done using bits size messages in time and messages for deterministic reads or time and messages for adversarial reads.

  4. Algorithm preprocessing can be done using bits size messages in time and messages for deterministic reads, or time and messages for adversarial reads.

where is an upper bound on the number of ports of a node

Corollary 1.2.

There is a fully compact and distributed variant of the routing scheme for a network of nodes with bits memory, bits routing tables, and bits labels and message size.

Corollary 1.3.

There is a fully compact and distributed variant of the self-healing algorithm for a network of nodes with bits memory and message size.

Corollary 1.4.

There is a fully compact and distributed variant of the self-healing routing algorithm for a network of nodes with bits internal memory.

1.1 Warm up: Leader Election by Flooding

As a warm up, let us implement flooding and use it to elect a leader. Assume that a node has a message in its memory that needs to be flooded on the network. Node executes the primitive Broadcast (Sec. 2.1): sweeps through its ports in order copying to every port to be sent out in the next round. In the next round, all nodes, in particular, the neighbours of read through their ports in deterministic or adversarial order and receive from . is copied to the main memory and subsequently broadcast further. To adapt the flooding algorithm for leader election, assume for simplicity that all nodes wake up simultaneously, have knowledge of diameter () and elect the highest as leader. Since every node is a contender, it will broadcast its own ID: say, broadcasts (message with ) in the first round. In the next round, every node will receive a different message from its neighbours.

  Init :
  
  
  broadcast 

  BeginRound :
  
0:   from port :
  if   then
      
      
      
  EndRound :
  if   then
       broadcast 
Algorithm 1 Leader Election By Flooding Rules :

Since a node may have a large number of neighbours, it cannot copy all these s to the main memory (as in standard message passing) and deduce the maximum. Instead, it will use the interleaved processing in a streaming/online manner to find the maximum ID received in that round. Assume that a node has a few neighbours { } and the reads are executed in order and so on. To discover the maximum received, simply compares the new read against the highest it has: let us call this function (this is a locally compact function). Therefore, now executes in an interleaved manner . At the end of the round, has the maximum seen so far. Every node executes this algorithm for synchronous rounds to terminate with the leader decided. Note the algorithm can be adapted to other scenarios such as non-simultaneous wakeup and knowledge of (not ) with larger messages or more rounds. Without knowledge of bounds of or , an algorithm such as in [41] (Algorithm 2) can be adapted (not discussed in this paper). The pseudocode given as Algorithm 1 leads to the local variable being set to the leader’s ID. The local variables of a node used in our algorithms are given in table 1 along with the stage at which they are set to the right value.

2 Model

We assume a connected network of arbitrary topology represented by an undirected graph with and for nodes and bidirectional links. Every node has compact internal memory (of size ), a unique id and a collection of ports (interfacing with the bidirectional links) each with a a locally unique port-id. Each port has an in-buffer that can be read from and an out-buffer that can be written to. Note that the ports need not be physical devices but even uniquely identifiable virtual interfaces e.g. unique frequencies in wireless or sensor networks. Also, the neighbours need not be on contiguous ports i.e. there may be ‘dead’ ports interspersed with live ones. This may happen, for example, when even though starting from contiguous ports, certain neighbours get deleted (as in our self-healing scenarios) or subnetworks (e.g. spanning trees) are generated from the original network. Therefore, our algorithms have to be aware of such ‘dead’ ports. For this work, we assume a synchronous network i.e. the communication between the nodes proceeds in synchronous rounds.

2.1 Compact Memory Passing model

In this work, we are interested in overcoming the assumption of unlimited computation power of nodes by restricting their internal memory. This is a natural condition for many real life networks such as the increasingly prevalent networks of low memory devices as in the Internet of Things (IOT), and for applications such as compact routing (which limit memory usage). The main criteria is to limit node memory to . We do not ask for a bound on the degree of nodes in the network. This implies that a node may not be even able to store the IDs of all its neighbours in its internal memory if it has memory. A parametrised version would be a (B,S)-compact model where is the local memory size and is the maximum size of a message exchanged at each edge. For example, we could be interested in a sublinear CONGEST model with and . Notice that can not exceed since a node needs to generate the message in its internal memory before sending it. The case of -compact would be naturally interesting since this would be comparable with the standard CONGEST model (with low internal memory) while allowing applications such as compact routing. Since a node might not be able to store all the information sent to it in a round of communication, to allow meaningful computation in compact memory, we need to revisit the standard message passing model (at a finer granularity). Hence, we introduce the () model and its variants.

In the standard synchronous message passing model, nodes are assumed to have unlimited computation. In each round, a node reads all its ports, possibly copying received messages to internal memory, processing the inputs preparing messages which are then written to the ports for transmission. However, our nodes having compact memory cannot store the inputs (or even s) of their neighbours Hence, we propose the following model with a streaming style computation.

():

Communication proceeds in synchronous rounds. Internally, in every round, every node executes a sweep of its ports fulfilling the following conditions:

  1. Mutable reads condition: If a read is executed on an in-buffer, the value in that buffer is cleared i.e. not readable if read again.

  2. Fair interleaving condition: In a sweep, can read and write to its ports in any order interleaving the reads and writes with internal processing i.e. , where , and stand for processing (possibly none), reading and writing (subscripted by port numbers ( ). For example, are are valid orders. Note that the memory restriction bounds the local computation between reads and writes in the same round. We say that such computations are given by locally compact functions where a locally compact function takes as input the previous read and the node state to produce the next state and message(s). e.g. in the extreme case constant local memory allows only a constant the local computation between reads and writes is constant too.

    1. (self) deterministic reads: chooses the order of ports to be read and written to provided that in a sweep, a port is read from and written to at most once. Note that the node can adaptively compute the next read based on previous reads in that sweep.

    2. adversarial reads: An adversary decides the order of reads i.e. it picks one of the possible permutations over all ports of the node. The order of writes is still determined by the node itself. A special case is a randomized adversary that chooses a random permutation of reads. A strong adversary can adaptively choose the permutation depending on the node state.

We define the following primitives: “Receive  from ” reads the message from in-buffer to the internal memory; “Send  via ” will write the message to the out-buffer of port and “Broadcast ” will write the message on every port of the node for transmission. Since condition  2 limits writes to one per port, we also define a primitive “Broadcast  except ” which will ‘send’ the message on each port except the ones listed in . This can be implemented as a series of Sends where the node checks before sending the message. Notice that has to be either small enough to fit in memory or a predicate which can be easily computed and checked. For ease of writing, we will often write the above primitives in text in a more informal manner in regular English usage i.e. receive, send, broadcast, and ‘broadcast except to …’ where there is no ambiguity. A message is of the form .

Figure 2: The model can be viewed as each node executing a kind of streaming algorithm with read-only input and write-only output tapes specifying connections to neighbours.

The model can be viewed as each node executing some kind of streaming algorithm (ref. Figure 2). The incoming messages form a continuous stream in a permutation decided by an adversary (or the algorithm itself). The internal algorithm reads this stream and produces, using its restricted local memory, a stream of outcoming messages that fills the out-buffers.

3 Background (CompactFTZ) and paper layout

: Compact Preprocessing followed by Compact self-healing routing

1:   Given a distinguished node (e.g. by compact leader election  (Sec. 1.1))
2:   A BFS spanning tree of graph  (Sec. 4.1)
3:   Setup of heavy arrays by compact convergecast of  (Sec. 4.2)
4:   DFS traversal and labelling (renaming) of  (Sec. 4.3)
5:   Setup of light levels by BFS traversal of  (Sec. 4.4)
6:   Setup of  (Sec. 5)
7:  while true do
8:     if a vertex (with parent ) is deleted then // Self-healing [6]
9:         if  was not a leaf (i.e., had any children) then // Fix non leaf deletion
10:            ’s children ’s Will using ’s they have; ’s heir takes over ’s duties.
11:            All affected Wills are updated by simple update of relevant .
12:         else // Fix leaf deletion
13:            if  is real/alive then // Update Wills by simulating the deletion of and
14:                informs children about deletion; they update exchanging messages via
15:            else // had already been deleted earlier
16:                Let be ’s leafheir; ’s and affected nodes update .
17:     if A message headed for is received at node  then // Compact Self-Healing Routing
18:         if  is a real node then // Deliver over regular network via compact routing scheme
19:            If () message has reached else if forward to parent else if forward to light node through port else forward to a heavy node through port
20:         else // is a virtual helper node (=
21:            If () message has reached else traverse in a binary search manner
Algorithm 1 with preprocessing: A high level view

Here, we give a brief background of , and referring to the relevant sections for our solutions. Note that some proofs and pseudocodes have been omitted from the paper due to the lack of space. Algorithm 1 captures essential details of (and of and ). These algorithms, like most we referred to in this paper, have a distinct preprocessing and main (running) phase. The data structures are setup in the preprocessing phase to respond to events in the main phase (node deletion or message delivery). First, let us consider the intuitive approach. is designed to deliver messages between sender and receiver (if it hasn’t been adversarially deleted) despite node failure (which is handled by self-healing). Self-healing () works by adding virtual nodes and new connections in response to deletions. Virtual nodes are simply logical nodes simulated in real (existing) nodes’ memories. Thus, the network over time is a patchwork of virtual and real nodes. It is now possible (and indeed true in our case) that the routing scheme may not work over the patched (self-healed) network and the network information may be outdated due to the changes. Thus, the composition has two distinct routing schemes and has to ensure smooth delivery despite outdated information. Nodes then respond to the following events: i) node deletion (line 8): self-heal using moving from initial graph to and so on (the deletion yielding ), or ii) message arrival (line 17): Messages are forwarded using or the second scheme (which is simply binary search tree traversal in our case).

Consider . seeks to limit diameter increase while allowing only constant (+3) node degree increase over any sequence of node deletions. Starting with a distinguished node (line 1), it constructs a BFS spanning tree in the preprocessing (line 2) and then sets up the healing structures as follows. A central technique used in topological self-healing is to replace the deleted subgraph by a reconstruction subgraph of its former neighbours (and possibly virtual nodes simulated by them). These subgraphs have been from graph families such as balanced binary search trees [34], half-full trees [33], random r-regular expanders [48], and p-cycle deterministic expanders [47]. Figure 3 illustrates this for where the star graph of deleted node is replaced by the Reconstruction Tree() of . In preprocessing (line 6), every node constructs its (also called its ) in memory and distributes the relevant portions (called ) to its neighbours so that they can form the if it is deleted. However, since nodes do not have enough memory to construct their , they rely on a compact function to generate the relevant will portions. Referring back to Figure 1, the tree in the figure can be thought of as a of a deleted node (or its before demise) and the subgraphs in the boxes as the (one per neighbour). The node now queries the compact function SearchHT(Algorithm 2) to generate . Once these structures have been setup in preprocessing, the main phase consists of ‘executing’ the i.e. making the new edges upon deletion and keeping the updated. The actions differ for internal and leaf nodes – cf. [6] for details.

Figure 3:  [6, 34]: The deleted node is replaced by a reconstruction tree () with ’s ex-neighbours forming the leaves and simulating the internal virtual nodes (e.g. is simulated by ).

Now, consider the routing scheme . is postorder variant of the tree routing scheme of [55]. The scheme is wholly constructed in the preprocessing phase - the original paper does not give a distributed construction. Here, we give a compact distributed construction. On a rooted spanning tree (the BFS tree obtained for above), every node is marked either heavy if it heads a subtree of more than a (a constant) fraction of its parent’s descendants else light. Reference to (at most ) heavy children is stored in an array with corresponding ports in an array making a routing table. We do this by a compact convergecast (Line 3). A DFS traversal prioritised by heavy children follows; nodes now get relabeled by their DFS numbers (line 4). Lastly, for every node, its path from the root is traced and the light nodes on the way (which are at most ) are appended to its new label (line 5). Every node now gets a ‘light level’ as the number of light nodes in its path. Note that the label is of bits requiring our algorithms to use bits memory. All other parts (including ) require only bits. This yields a compact setup of . When a packet arrives, a real node checks its parent and array for the DFS interval failing which it uses its light level and the receiver’s label to route the packet through light nodes. If a packet comes to a virtual node, binary search traversal is used since our s are binary search trees. Interestingly, even though the arrays and light levels etc. get outdated due to deletions, [6] shows routing continues correctly once set up.

4 Some Basic Tree Algorithms and Preprocessing

We present here three distributed algorithms related to trees: (1) BFS traversal and spanning tree construction, (2) convergecast and (3) DFS traversal, tree construction and renaming. We present these independently and also adapting them in the context of , and preprocessing. The general algorithms can be easily adapted for other problems, for example, the BFS construction can be adapted to compute compact topdown recursive functions, convergecast for aggregation and bottom-up recursive functions and DFS to develop other priority based algorithms.

: the total number of neighbours (already set)
: a boolean to recognise the root of the tree (set by Alg.1)
: the Id of the root of the tree (set by Alg.1)
and : the ID and port of the parent in the BFS tree (set by Alg.1)
: the number of node that accepted the node as parent (set by Alg.1)
: the weight of the node (set by Alg.2)
: a boolean to say if the node is an heavy child (of its parent) (set by Alg.2)
and : the lists of heavy node Id () and port () (both of length ) (set by Alg.3)
: a port id (set by Alg.3)
: the ID of one port from the parent (set by Alg.3)
:
an integer representing the position of the node
 in the list of the children of its parent
(set by Alg.3)
: the ID of last used port (set by Alg.3)
: the new ID given by the server (set by Alg.3)
: the smallest of the new ID among all the node in the subtree (set by Alg.3)
: list of needed node informations: a counter of uses, the node ID and the port ID (set by Alg.4)
: list of partially computed will parts (set by Alg.4)
Table 1: Table of variables of a node (used by the algorithms in Section 1.1 and Sections 4.1 to 5)

4.1 Breadth First Traversal and Spanning Tree Construction

  Init :
  if   then
      broadcast 

0:   from :
  
  if  then
      Terminate
  if  and  then
      Terminate
0:   from port :
  if   then
       ;
       send  via
       broadcast  except
  else
      
  if  then
      Terminate
Algorithm 1 BFS Tree construction Rules :

We assume the existence of a Leader (Section 1.1). Namely each agent has a boolean variable called such that this variable is set to for each agent except exactly one. This Leader will be the root for our tree construction. The construction follows a classic Breadth First Tree construction. The root broadcasts a message to all its neighbours. When receiving a message for the first time a node joins the tree: it sets its and variables with the node and port ID from the sender, it answers and broadcasts further. It will ignore all next messages. To ensure termination, each node counts the number of and messages it has received so far, terminating the algorithm when the count is equal to the number of its neighbours.

Lemma 4.1.

After rounds, every node has set its pointer and reached the state .

Proof.

The graph network is connected. Therefore a path of length at most exists between the root and any node . The root is from the start in the tree. While not all the nodes of the path joined, each round at least one node joins the tree. After rounds all the nodes of the path have joined.

After joining a node sends a message on each link: to its parent and to the others. The nodes do not send any other message. This means that each node will receive one message from each link. Each message increments one counter ( or ) except for the accepted message. Eventually the sum of those counters will reach their maximum values and the node will enter state. ∎

Lemma 4.2.

There is no cycle in the graph induced by the pointers.

Proof.

A node sends a message only if it has already joined the tree. Then a node accepts a message only from a node already inside the tree. If we label the node with the time where they joined the tree, a parent has a time-label strictly smaller than any one of its children. This implies there can not be a cycle.

An other simpler argument is to notice that we have a connected graph with vertices and edges (each node has one parent except the root), then it is a tree. ∎

Corollary 4.3.

Algorithm 1 constructs a BFS spanning tree in rounds using messages overall.

4.2 Convergecast

  Init :
  if  then
       ;
      send  via
  else
       ;
  BeginRound :
  if  then
       send  via
0:  send  from :
   ;
  Terminate
0:   from :
   ;
  if   then
       send  via
      

0:   from :
  if   then
      if   then
          send  via
          insert in ; insert in
      else
          send  via
Algorithm 2 Weight Computation by Convergecast - Rules :

We present a distributed convergecast algorithm assuming the existence of a rooted spanning tree as before with every node having a pointer to its parent. We adapt it to identify heavy and light nodes for preprocessing. The weight of a node is 1 if is a leaf, or the sum of the weight of its children, otherwise.

For a given constant , a node with parent is heavy if , else is light.

Algorithm 2 computes the weight of the nodes in the tree while also storing the IDs and ports of its heavy children in its lists and . It is easy to see that a node can have at most heavy children, thus and are of size . To compute if the node is a heavy child, it has to wait for its parent to receive the weight of all its children. The parent could then broadcast or the child continuously sends messages until it receives an answer (message type in Algorithm 2). Note the broadcast version will accomplish the same task in rounds with messages, so either could be preferable depending on the graph.

Lemma 4.4.

Algorithm 2 accomplishes weights computation in rounds with messages.

Proof.

Let sort the node by their depth in the tree, from 0 for the root to for some leaf. After round , all node of depth are done. Then after rounds, all node are done.

Each node receives one message from each children and send one to its parent. To compute the heavy children there is one more exchange between parent and children. This means that there are only a constant number of messages per edge of the tree. ∎

4.3 Depth First Walk And Node Relabelling

  Init :
   ;
  if  then
       ;
      if   then
           send  via
      else
          while  not connected do
              
          send  via port
0:   from :
  // Descending token
  if  then
       ;
      if  then
          // Send to an heavy child first
          send  via
      else
          // there is no heavy child
          while  and not connected or or  do
              
          if   then
              // No child, sending back the token
              
              
              send 
                       via
          else
              send  via port
  else
      // the message didn’t come from
       send  to
0:   from :
  if  then
       ;
      Terminate

0:   from:
  // Backtracking token
  if  then
      if  then
           ;
      else
          send  via
       ;
  if   then
      // Send to the next heavy child first
       send  via
  else
      // there is no more heavy child
      while  and not connected or or  do
          
      if  then
           send  via
          
          if  then
              Terminate
          else
              // No more child, sending back the token
              send 
                       via
      else
          send  via port
Algorithm 3 DFS Traversal Renaming Rules :

The next step in the preprocessing of is to relabel the nodes using the spanning tree computed in the previous section. The labels are computed through a post-order DFS walk on the tree, prioritizing the walk towards heavy children. In the algorithm, the root starts the computation, sending the token with the ID set to 1 to its first heavy child. Once a node gets back the token from all its children, it takes the token’s ID as its own, increments the token’s ID and sends to its parent. Note that in our algorithm, each node has to try all its ports when passing the token (except the port connected to its parent) since cannot ‘remember’ which ports connect to the spanning tree. Our solution to this problem is to “distribute” that information among the children. This problem is solved while performing the DFS walk. Each node , being the -th child of its parent , has a local variable , which stores the port number of connecting it with its -th child. This compact representation of the tree will allow us to be round optimal in the next section.

Lemma 4.5.

Algorithm 3 relabels the nodes in a pre-order DFS walk in rounds, using messages overall.

Proof.

The token walks over the graph by exploring all the edges (not just the tree edges), each edge are used at most 4 times. During each round, exactly one node sends and receives exactly one message. ∎

4.4 Computing Routing Labels

0:   from port :
  if  then
      if  then
          
      else
          
      
      for all port  do
          send  via
      Terminate
  Init :
  if  then
      
      
      for all port  do
          send  via
Algorithm 4 Computing Routing Labels with sized messages:

We now have enough information (a leader, a BFS spanning tree, node weights, DFS labels) to produce routing labels in , and hence, to complete the preprocesing. For a node , its light path is the sequence of port numbers for light nodes in the path from the root to . The routing label of in is the pair , where is its DFS label and its light path. The second routing table entry for the root is empty.

A simple variant of Algorithm 1 computes the routing labels if sized messages are permitted (Algorithm 4), otherwise a slower variant can do the same with messages (Algorithm 5). For the size variant, the root begins by sending its path(empty) to each port along with the port number . When a node receives a message from its parent, it sets its light path to , if it is light, otherwise to only, producing its routing label. Then, for each port , it sends its light path together with the port number . For the size variant (Algorithm 5), every light node receives from its parent the port number it is on (say, port ) and then does a broadcast labeled with . The root also broadcasts a special message. Ever receiving node appends a received to its incrementing its light level and terminating when receiving the root’s message.

  Init :
  
  for all port  do
      send  via
  if   then
      Terminate
0:   from port :
  if  and not  then
      
      if  then
          broadcast except 
  else if  then
      broadcast except 
  if   then
      Terminate
0:   from port :
  if  then
      if  then
          
      if  then
          broadcast except 
  if   then
      Terminate
Algorithm 5 Computing Routing Labels with sized messages:
Lemma 4.6.

Algorithm 4 computes the routing labels of in rounds using messages of size.

Proof.

Each node receive at most one message from each neighbors (and take into consideration only the one coming from its parent). And the longest a message can travel is from the root to a leaf. Each node sends only 1 message to the server. This requires then messages and rounds. ∎

Lemma 4.7.

Algorithm 5 computes the routing labels of in rounds using messages of size.

Proof.

During the initialization round, each node send trhough each of its port the corresponding port number. during the first round each node receive those port number and consider only the one coming from its parent. If it is a light child, it broadcasts this number in its subtree. This way each node will receive at different time but in order all the chain of port to a light child from its parent to the root. The messages initiated by the root are special and forwarded even by its heavy children since it is a termination signal.

At each round at most one message is send on any edge. And the longest travel of a message is from the root to the furthest leaf. Therefore this requires then messages and rounds. ∎

5 Compact Forgiving Tree

Section 3 gives an overview of . As it stated, the central idea is a node’s (its ) which needs to be pre-computed before an adversarial attack. [6] has only a distributed non-compact memory preprocessing stage111Once the non-compact memory preprocessing stage is completed, each process uses only compact memory during the execution of the algorithm. in which, in a single round of communication, each node gathers all IDs from its children, locally produces its , and then, to each child, sends a part of its , called or subwill, of size . Computing the with compact memory is a challenging problem as a node might have neighbours making its of size . Thus, to compute this information in a compact manner, we need a different approach, possibly costing more communication rounds. Remarkably, as we show below, one round is enough to accomplish the same task in the model, with deterministic reads. The solution is made of two components: a local compact function in Section 5.1 that efficiently computes parts of labelled half-full trees of size using only memory, and a compact protocol in Section 5.2 that solves the distributed problem of transmitting the to its children in a single round.

5.1 Computing Half-Full Trees with Low Memory

Half-full trees [33] (which subsume balanced binary trees), redefined below, are the basis for computing the of a node in CompactFT. At the core of the construction is a labelling of the nodes with good properties that allows to have of size to the children of a node. Roughly speaking, a half-full tree is made of several full binary trees, each with a binary search tree labelling in its internal nodes. In what follows we show how to compute properties of that label

5.1.1 Computing labels of full binary trees

Given a power of two, , consider the full binary tree with leaves defined recursively as follows. The root of the tree is the string , and each node has left child and right child . It is easy to see that the nodes at height are the binary representation of . We write the integer represented by the chain . Moreover, for any node , its left and right children represent the number and , respectively. Let denote the previous tree. We now define a function used in CompactFT that labels the nodes of in the space . Of course the labelling is not proper but it has nice properties that will allow us to compute it using low memory.

Consider a node of . Let denote the height of in . Then, we define as follows: if , , otherwise .

In words, if is of height , its label is simply , otherwise its label is computed using a base number, , plus times an offset, . Figure 4 (left) depicts the tree and its labelling . Note that the internal nodes have a binary search tree labelling.

Figure 4: The tree at the left is with its labeling . Each circle shows in its interior the binary identifying the vertex and its decimal value. Near each node appears its label . Non-leaf nodes correspond to bold line circles. The tree at the right is the half-full tree with its labeling .
Lemma 5.1.

Let be a non trivial tree –with . For every vertex , . For the root , . Consider any . There is a unique leaf of such that . If , there is a unique non-leaf of such that , and there is no non-leaf of such that .

Proof.

Let be a node of . As explained above, . It is clear that . If , then . Else .

The root of has height and , hence, by definition, . Now, consider any . Since all leaves are at height , there is a unique leaf with . Suppose that . There exists a unique interger factorization of then there exists unique and such that . This decomposition can be easily obtained from the binary representation of . By construction, we have then and we have then . Let consider the unique (non-leaf) node such that and . It means that is the unique node such that . Finally, there is no non-leaf of such that because we just proved that each element of has a unique inverse image under . Since the number of non-leaf node is exactly , there is no non leaf node such that . ∎

By Lemma 5.1, when considering the labelling , each appears one or two times in , on one leaf node and at most on one non-leaf node. Thus, we can use the labelling to unambiguously refer to the nodes of . Namely, we refer to the leaf of with label as leaf , and, similarly, if , we refer to the non-leaf of with label as non-leaf . By abuse of notation, in what follows denotes the tree itself and its labelling as defined above. The following lemma directly follows from the definition of .

Lemma 5.2.

Let be a non trivial tree (). Consider any . The parent of the leaf is the non-leaf . If , then let . If , the parent of the non-leaf is the non-leaf . If , the left and right children of the non-leaf are the non-leafs and , respectively. If