Self-Stabilizing Supervised Publish-Subscribe Systems

10/23/2017 ∙ by Michael Feldmann, et al. ∙ Universität Paderborn 0

In this paper we present two major results: First, we introduce the first self-stabilizing version of a supervised overlay network by presenting a self-stabilizing supervised skip ring. Secondly, we show how to use the self-stabilizing supervised skip ring to construct an efficient self-stabilizing publish-subscribe system. That is, in addition to stabilizing the overlay network, every subscriber of a topic will eventually know all of the publications that have been issued so far for that topic. The communication work needed to processes a subscribe or unsubscribe operation is just a constant in a legitimate state, and the communication work of checking whether the system is still in a legitimate state is just a constant on expectation for the supervisor as well as any process in the system.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The publish subscribe paradigm ([7, 8]) is a very popular paradigm for the targeted dissemination of information. It allows clients to subscribe to certain topics or contents so that they will only receive information that matches their interests. In the traditional client-server approach the dissemination of information is handled by a server (also called broker), which has the benefit that the publishers are decoupled from the subscribers: the publisher does not have to know the relevant subscribers, and the publisher and subscribers do not have to be online at the same time. However, in this case the availability of the publish subscribe system critically depends on the availability of the server, and the server has to be powerful enough to handle the dissemination of the publish requests. An alternative approach is to use a peer-to-peer system. However, if no commonly known gateway is available, the peer-to-peer system cannot recover from overlay network partitions. In practice, peer-to-peer systems usually have a commonly known gateway since otherwise new peers may not be able to get in contact with a peer that is currently in the system (and can therefore process the join request). In our supervised overlay network approach we assume that there is a commonly known gateway, called supervisor, that just handles subscribe and unsubscribe requests but does not handle the dissemination of publish requests, which will be handled by the subscribers in a peer-to-peer manner. We are interested in realizing a topic-based supervised publish subscribe system, which means that peers can subscribe to certain topics (that are usually relatively broad and predefined by the supervisor).

Topic-based publish subscribe systems have many important applications. Apart from providing a targeted news service, they can be used, for example, to realize a group communication service [9], which is considered an important building block for many other applications ranging from chat groups and collaborative working groups to online market places (where clients publish service requests), distributed file systems or transaction systems. To ensure the reliable dissemination of publish requests in a topic-based publish subscribe system, we present a self-stabilizing supervised publish subscribe system, which ensures that for any initial state (including overlay network partitions) eventually a legitimate state will be reached in which all subscribers of a topic know about all publish requests that have been issued for that topic. We also show that the overhead for the supervisor in our system is very low. In fact, the message overhead of the supervisor is just a constant for subscribe and unsubscribe operations, and the supervisor has a low maintenance overhead in a legitimate state.

1.1 Model

We model the overlay network of a distributed system as a directed graph , where . Each peer is represented by a node . Each node is identified by its unique reference or identifier (called ID). Additionally, each node maintains local protocol-based variables and has a channel , which is a system-based variable that contains incoming messages. We assume a channel to be able to store any finite number of messages, and messages are never duplicated or get lost in the channel. If a node has the reference of some other node , can send a message to by putting into . There is a directed edge whenever stores a reference of in its local memory or there is a message in carrying the reference of . In the former case, we call that edge explicit and in the latter case we call that edge implicit. Note that every node is assumed to know the supervisor, and this information is read-only, so always contains a directed star graph from all peers to the supervisor.

Nodes may execute actions: An action is just a standard procedure and has the form , where is the name of that action, defines the set of parameters and defines the statements that are executed when calling that action. It may be called locally or remotely, i.e., every message that is sent to a node has the form . When a node processes a message , then is removed from . Additionally, there is an action that is not triggered by messages but is executed periodically by each node. We call this action Timeout.

We define the system state to be an assignment of a value to every node’s variables and messages to each channel. A computation is an infinite sequence of system states, where the state can be reached from its previous state by executing an action that is enabled in . We call the first state of a given computation the initial state. We assume fair message receipt, meaning that every message of the form that is contained in some channel, is eventually processed. Furthermore, we assume weakly fair action execution, meaning that if an action is enabled in all but finitely many states of a computation, then this action is executed infinitely often. Consider the Timeout action as an example for this. We place no bounds on message propagation delay or relative node execution speed, i.e., we allow fully asynchronous computations and non-FIFO message delivery. Our protocol does not manipulate node identifiers and thus only operates on them in compare-store-send mode, i.e., the nodes are only allowed to compare node IDs, store them in a node’s local memory or send them in a message.

In this paper we assume for simplicity that there are no corrupted IDs (i.e., IDs of unavailable nodes) in the initial state of the system. However, dealing with them is easy when having a failure detector that is eventually correct since, due to the supervisor, the correctness of our protocol cannot be endangered by sending messages to non-available nodes. Since our protocol just deals with IDs in a compare-store-send manner, this implies that node IDs will always be non-corrupted for all computations. Nevertheless, the node channels may initially contain an arbitrary finite number of messages containing false information. We call these messages corrupted, and we will argue that eventually there will not be any corrupted messages in the system. We will show that our protocol realizes a self-stabilizing supervised publish-subscribe system.

[Self-stabilization] A protocol is self-stabilizing w.r.t. a set of legitimate states if it satisfies the following two properties:

  • Convergence: Starting from an arbitrary system state, the protocol is guaranteed to arrive at a legitimate state.

  • Closure: Starting from a legitimate state, the protocol remains in legitimate states thereafter.

1.2 Related Work

The concept of self-stabilizing algorithms for distributed systems goes back to the year 1974, when E. W. Dijkstra introduced the idea of self-stabilization in a token-based ring [4]. Many self-stabilizing protocols for various types of overlays have been proposed, like sorted lists [18], de Bruijn graphs [19], Chord graphs [13] and many more. There is even a universal approach, which is able to derive self-stabilizing protocols for several types of topologies [2].

The cycle topology is particularly important for our work. Our cycle protocol is based on [12], in which the authors construct a self-stabilizing cycle that acts as a base for additional long-range links, both together forming a small-world network.

The paper closest to our work is by Kothapalli and Scheideler [14]. The authors provide a general framework for constructing a supervised peer-to-peer system in which the supervisor only has to store a constant amount of information about the system at any time and only has to send out a constant number of messages to integrate or remove a node. However, their system is not self-stabilizing.

In the literature there are publish-subscribe systems that are self-stabilizing: e.g. in [17] the authors present different content-based routing algorithms in a self-stabilizing (acyclic) broker overlay network that clients can publish messages to. Their main idea is a leasing mechanism for routing tables such that it is guaranteed that once a client subscribes to a topic there is a point in time such that every publication which is issued thereafter is delivered to the newly subscribed client (i.e., there are no guarantees for older publications). While the authors focus on the routing tables and take the overlay network as a given ingredient, our work focuses on constructing a self-stabilizing supervised overlay network and then using it to obtain a self-stabilizing publish-subscribe system.

A self-stabilizing publish-subscribe system for wireless ad-hoc networks is proposed in [20], which builds upon the work of [21]: Similar to our work, the authors arrange nodes in a cycle with shortcuts and present a routing algorithm that makes use of these shortcuts to deliver new publications for topics to subscribers only after steps. Subscribe and unsubscribe requests are processed by updating the routing table at nodes. Both systems described above differ from our approach, as they solely focus on the routing scheme and updates of the routing tables, while we focus on updating the topology upon subscribe/unsubscribe requests. Additionally, our system is able to deliver publications in steps, if we use flooding, since we use a network with logarithmic diameter. Furthermore, we are also able to deliver all publications of a domain to a new subscriber after only a constant number of rounds.

There is a close relationship between group communication services (e.g., [9, 1]) and publish-subscribe systems. Processes are ordered in groups in both paradigms and group-messages are only distributed among all members of some group. Self-stabilizing group communication services are proposed in [6] for ad-hoc networks and in [5] for directed networks. However, there are some key differences: In group communication services, participants have to agree on group membership views. This results in a high memory overhead for each member of a group, as nodes in a group technically form a clique. On the other hand subscribers of topics in publish-subscribe systems are in general not interested in any other members of the topic. For our approach, this results in logarithmic worst-case and constant average case degree for subscribers.

1.3 Our Contribution

To the best of our knowledge, we present the first self-stabilizing protocol for a supervised overlay network. We focus on a topology that is a ring with shortcuts which we call skip ring. The corresponding protocol BuildSR is split up into two subprotocols: One protocol is executed at the supervisor (see Section 3.1), the other one is executed by each subscriber (Section 3.2). Our basic protocol assumes that all references actually belong to existing nodes. However, we also present an extension (see Section 3.3) to handle references to non-existing nodes and unannounced failures of nodes. In contrast to the supervised overlay network proposed in [14], our new protocol lets the supervisor handle multiple insertions/deletions in parallel without having to rely on confirmations from other nodes, however, at the cost of storing much more references than the solution in [14].

The skip ring shares some similarities with other shortcut-based peer-to-peer systems like Chord networks [13] or skip graphs [10]. However, our network has a better congestion than these networks, as the supervised approach allows a much more balanced distribution of these nodes.

We show how to use the supervised skip ring to obtain a self-stabilizing publish-subscribe system (see Section 4) in which each skip ring corresponds to a topic. Every subscriber of a topic eventually gets all publications that have been issued so far for that topic. The shortcuts in the skip ring are helpful when using flooding to distribute new publications among all subscribers, since a skip ring of nodes has diameter .

In our self-stabilizing publish-subscribe system the message overhead of the supervisor is linear in the number of topics (but not in the number of subscribers), if we use a simple generalization strategy in which each topic corresponds to one skip ring. This, of course, decreases the applicability of our system in large-scale scenarios. However, better scalability can be achieved by organizing topics in a hierarchical manner, or by having different supervisors for each topic. For the latter scenario, one could make use of a self-stabilizing distributed hash table (with consistent hashing) for all supervisors, in which a sub-interval of is assigned to each supervisor. By hashing IDs of topics in the same manner, each supervisor is then only responsible for the topics in its sub-interval. Since solutions for self-stabilizing distributed hash tables already exist in the literature (see e.g. [11]), we do not elaborate on them further in this paper.

2 Preliminaries

In this section we formally introduce the topology for a skip ring (Section 2.1). As a base for our self-stabilizing protocol we introduce the BuildRing protocol from [12] to arrange nodes in a sorted ring (Section 2.2).

2.1 Skip Ring

Let be a mapping with the property that for every with binary representation (where is minimum possible). Intuitively, takes the leading bit of the binary string representing the input value and moves it to the units place. In our setting the supervisor will use to assign a (unique) label to each subscriber. Labels are generated in the order: , , , , , , , , . Note that is invertible. We call the value of a label. Denote by the minimum number of bits used to encode label. A label may either be represented as a bit string or as a real-valued number within by evaluating the function with . The function induces an ordering of all nodes in a ring, which will be used in the following to define the skip ring:

[Skip Ring] A skip ring is a graph with nodes. is defined as follows:

  • Each node has a unique label denoted by with .

  • are consecutive in the ordering induced by . Denote the edges in as ring edges.

  • is part of the sorted ring w.r.t. node labels over all nodes in , , where . Denote as a shortcut on level , if .

The label of a node is independent from its unique ID and will be determined by the supervisor.

The intuition behind and is that we want all nodes with label of length at most to form a (bidirected) sorted ring for all . For these edges are stored in , for they are stored in . Due to the way we defined the function it holds that for all the values are uniformly spread in between old values with . This implies that the longer a node is a participant of the system, the more shortcuts it has. This makes sense from a practical point of view, since older and thus more reliable nodes hold more connectivity responsibility in form of more shortcuts.

The decision whether two nodes are connected or not only depends on the labels of the nodes, which means that a arrival/departure of a node only affects its neighbors (see Section 4.1 for details). Figure 1 illustrates .

Figure 1: A skip ring consisting of 16 nodes. The triples are of the form where , is the corresponding label and is the real valued version of the label. Black edges are ring edges (), green edges are shortcuts for , red edges for level and the blue edge is the shortcut for .

The following Lemma follows from the definition of :

[Node Degree] In a legitimate state, the degree of nodes in a skip ring is logarithmic in the worst case and constant in the average case.

Proof.

For convenience, we define . Node has shortcuts to nodes with label of length for each . Having nodes in the system, we know that is upper bounded by , which sums up the degree of to be .

Next, we want to compute the average degree of a node. We count the overall number of edges in a stable system containing subscribers. Let denote the number of subscribers with label of length . We have

Recall that the maximum length of a label is equal to in a stable state. Combining this fact with the above formula for the node degree, we get the following result for the number of edges in :

Dividing this value by yields an upper bound of for the average node degree. ∎

2.2 Self-Stabilizing Ring

The base of our self-stabilizing protocol is the BuildRing protocol from [12] that organizes all nodes in a sorted ring according to their labels, using linearization [18]: Each node stores edges to its closest left and right neighbors (denoted by ) according to ’s label (denoted by ). Any other nodes are delegated by to either or (depending on which node is closer to ). Additionally the node with minimum label stores an edge to the node with maximum label and vice versa, such that the sorted ring is closed. Nodes periodically introduce themselves to their neighbors and in the sorted ring: This means that sends a message to / containing a reference to itself. This way nodes can check, if the sorted ring is in a legitimate state from their point of view or not.

In our setting nodes may assume corrupted labels for their neighboring nodes in any nonlegal state: If node has an edge to , then locally stores the tuple . While the reference to is assumed to be correct by definition at any time, ’s variable may change to a different value at some point in time. Unfortunately, still has the old label value associated with , implying that . As a consequence, we extend the BuildRing protocol as follows: Whenever a node introduces itself to another node , then informs about the label that thinks is assigned to . Node then checks the label for correctness by comparing with , and if , sends its correct value of .

Including the modifications mentioned above, the extended BuildRing protocol is still self-stabilizing:

The BuildRing protocol with its extension is self-stabilizing.

Proof.

In case that there are no corrupted labels, the extended BuildRing protocol behaves the same as the standard BuildRing protocol, so we refer the reader to [12] to verify that the protocol is indeed self-stabilizing.

We are going to show that in case there exist corrupted node labels, these will eventually vanish. W.l.o.g. consider the variable for a node . We define the trace of as the chain of values , that are allocated to while the system stabilizes. By definition of the standard BuildRing protocol, labels stored in the trace for are monotonically decreasing. Furthermore, the trace is finite, since the number of labels (corrupted or correct) is finite. Let be the last node of this trace, i.e., eventually it holds . Then will introduce itself in its Timeout method to by sending a message storing itself and to . Upon receiving , is able to check if and send a reply storing the (correct) label to in case , s.t. corrects its label . This implies that the number of corrupted labels is reduced, but there may be a new trace generated for . But since the number of corrupted node labels is finite and is not duplicating, eventually, the overall number of corrupted node labels will reduce to . ∎

3 Self-stabilizing Supervised Skip Ring

In this section we first extend the skip ring topology by introducing a supervisor. The description of our BuildSR is then split into two sub-protocols: One sub-protocol is executed by the supervisor, the other one is executed by every other node. Adapting publish-subscribe terminology, we denote a node as subscriber for the rest of the paper.

Recall that every subscriber is assumed to know the supervisor , and this information is read-only, so the graph always contains edges . The assumption of having such a supervisor is not far-fetched, because even pure peer-to-peer systems need a common gateway that acts as an entrance point for peers.

Our goal in this section is to construct a self-stabilizing protocol in which subscribers form a skip ring with the help of the supervisor, starting from any initial state. The extension to a self-stabilizing publish-subscribe system is then described in Section 4.

3.1 Supervisor Protocol

The first part of the BuildSR protocol is executed by the supervisor. The supervisor maintains the following variables:

-

A containing subscribers and their corresponding labels. Denote .

-

A variable that is used to notify subscribers in a round-robin fashion.

In the supervisor’s Timeout method, the supervisor chooses a subscriber in a round-robin fashion (using the variable ) from its . Then the supervisor sends a message to containing ’s label , as well as the correct values for ’s predecessor and successor according to the . We call such a triple the configuration for .

In addition to the above action, the supervisor has to check the integrity of its : We say that the of is corrupted, if at least one of the following conditions hold:

  • There exists a tuple with . (There exists a tuple without any subscriber)

  • There exist entries with and . (There exist multiple tuples storing the same subscriber)

  • There exists , s.t. for all it holds . (There are labels missing)

  • There exists , s.t. there is a tuple with . (There exists a tuple with an incorrect label)

All of these cases may occur in initial states. Note that when using a hashmap for the , we do not need to check explicitly whether there are multiple tuples with the same label or whether there are tuples with the label set to . We perform the following actions to tackle the above 4 cases:

  • Upon detecting a tuple , we simply remove it from the .

  • Whenever a subscriber wants to unsubscribe or request its configuration, the supervisor first searches the database for all tuples with . It then removes all duplicates except the tuple with lowest label, guaranteeing that is associated with no more than one label.

  • In Timeout the supervisor checks for all if there is a tuple stored in the . If not, then the supervisor takes the tuple with maximum and replaces its label with .

  • It is easy to see that the action for (iii) also solves case (iv).

Observe that all of these actions are performed locally by the supervisor, i.e., they generate no messages. Therefore we assume that the of the supervisor is always in a non-corrupted state from this point on.

3.2 Subscriber Protocol

In this section we discuss the part of the BuildSR protocol that is executed by each subscriber. First, we present the variables needed for a subscriber. Note that we intentionally omit the reference to the supervisor here, since is assumed to be hard-coded. A subscriber stores the following variables:

  • : The unique label of or if has not received a label yet.

  • : Left and right neighbor in the ring as well as the cyclic connection in case is minimal/maximal.

  • : All of ’s shortcuts.

For the rest of the protocol description, we use and to indicate ’s left (resp. right) neighbor in the ring even if the left (resp. right) neighbor is stored in instead of (resp. ). We also may refer to the variables as ’s direct ring neighbors. Recall that each subscriber executes the extended BuildRing protocol from Section 2.2.

3.2.1 Receiving correct Labels

For now we focus on the ring edges only. Our first goal is to guarantee that every subscriber eventually stores its correct label in .

Recall that we have periodic communication from the supervisor to the subscribers, i.e., the supervisor periodically sends out the configurations to all subscribers stored in its . This action alone does not suffice in order to make sure that every subscriber eventually stores its correct label, since in initial states the may be empty and subscriber labels may store arbitrary values. Thus, we also need periodic communication from subscribers to the supervisor. The challenge here is to not overload the supervisor with requests in legitimate states of the system. Each subscriber periodically executes the following actions:

  • If , then asks the supervisor to integrate into the and send its correct configuration.

  • If

    , then, with probability

    , asks the supervisor for its correct configuration, where .

Action (ii) is dedicated to handle subscribers that have incorrect labels or already store a label, but are not known to the supervisor. Upon receiving a configuration request from a subscriber , the supervisor integrates into the (if it is not already contained in the ) and sends its configuration and thus its correct label.

We still need some further actions to tackle special initial states: Imagine a subscriber having a label such that the probability mentioned in (ii) becomes negligible. In case is not contained in the supervisor’s yet, will send a configuration request to the supervisor with only very low probability. The following action is able to solve this problem under the assumption that there exists a subscriber that is already contained in the supervisor’s and has stored as one of its direct ring neighbors.

  • W.l.o.g. let . If receives a configuration from the supervisor and , it checks whether is closer than the left ring neighbor proposed by the configuration, i.e., . In case this holds, requests the supervisor to send the correct configuration to .

The assumption for action (iii) may not hold in all initial states, i.e., there is a connected component in which all subscribers have stored labels such that the probability mentioned in (ii) becomes negligible. Note that actions (i)-(iii) suffice to show convergence in theory. In order to improve the time it takes the network to converge, we introduce one last periodic action:

  • Subscriber periodically requests its configuration with probability from the supervisor if it determines, based only on its local information, that its label is minimal.

We now sketch why eventually all subscribers in a connected component get their correct label. This is obviously the case when all subscribers in are stored in the supervisor’s as the supervisor will then periodically hand out the correct labels in a round-robin fashion. Denote a subscriber that is already stored in the supervisor’s as recorded. Action (iv) guarantees that we quickly have at least one recorded subscriber in a connected component . Assume that still contains non-recorded subscribers. As long as the supervisor is able to introduce new recorded subscribers to recorded subscribers in , ’s size grows, but since the number of subscribers is finite, will eventually become static. For such a static connected component we know that due to BuildRing, subscribers in eventually form a sorted ring. Then there exists at least one ring edge from a recorded subscriber to a non-recorded subscriber . Furthermore, ’s correct ring neighbor indicated by its configuration has to be further away from than (so for instance if we consider the right neighbor of ). This holds because no new subscriber can be introduced to a recorded subscriber in . Once receives its configuration from the supervisor, it triggers action (iii) and requests the configuration for , leading to being inserted into the supervisor’s and thus reducing the number of non-recorded subscribers in by one. This inductively implies that eventually all subscribers in are recorded. We now want to bound the expected number of requests that are periodically sent out to the supervisor when the system is in a legitimate state. For the next lemma, denote a timeout interval as the time in which every subscriber has called its Timeout method exactly once.

Consider a supervised skip ring with subscribers in a legitimate state. The expected number of configuration requests sent out by all subscribers is less than in each timeout interval.

Proof.

Since the system has subscribers, the maximum length of a subscriber’s label is equal to in a legitimate state. In a legitimate state, only the second action (ii) is executed by subscribers, as all subscribers have stored their correct configuration. Thus, requests are only sent from a subscriber to the supervisor with probability based on ’s label length . The number of subscribers with label of length is equal to and the probability that a subscriber with label of length contacts the supervisor in its Timeout procedure is equal to . It follows that the expected number of configuration requests sent out by subscribers with label of length is equal to . In summary, the expected number of configuration requests that are sent out by all subscribers is equal to . ∎

3.2.2 Maintaining Shortcuts

In this section we describe how subscribers establish and maintain shortcut edges. Recall that we have shortcuts on levels , where represents the ring edges that are already established. A subscriber with label of length has exactly 2 shortcuts on each level in in a legitimate state.

We first describe how a subscriber is able to compute all its shortcut labels locally, based only on the information of its left and right direct ring neighbors. The following approach only computes the respective labels in that a node should have shortcuts to, but not the subscribers that are associated with these labels. The idea is the following: In general, a subscriber has only shortcuts to other subscribers that lie on the same semicircle as , i.e., either the semicircle of subscribers within the interval or the semicircle of subscribers within the interval (where the is represented by the subscriber with label ). Consider a subscriber with and its two ring neighbors such that and . If recognizes that , then knows that it has to have a shortcut with label and , because node was previously inserted between subscribers with labels and . After this, can apply this method recursively, i.e., it checks for the computed label if until it reaches a label of less or equal length. This same procedure is applied analogously for .

As an example, recall the (stable) ring from Figure 1. Suppose we want to compute all shortcut labels for the subscriber with (real-valued) label , based only on the labels of its direct ring neighbors, which are and . We know that the label has length , which is greater than the length of label , which is . Thus, we get a shortcut for with label . The label has length , which is still greater than . Hence we compute a shortcut with label . Finally we know that the length of label is , which is smaller than , which terminates the algorithm. The computation of shortcut labels to and works analogously.

We are now ready to define the self-stabilizing protocol that establishes and maintains shortcuts for all subscribers. Consider a subscriber with label length . On Timeout, checks if contains subscribers on level . If that is the case, then introduces to , by sending a message to containing the reference of as well as ’s label . Also, introduces to in the same manner. Note that for , has to consider its two ring neighbors instead of . On receipt of such an introduction message consisting of the pair , checks if it has a shortcut with . If that is the case, then replaces the existing node reference by and, if , forwards the reference of on the sorted ring via the BuildRing protocol. This way it is guaranteed that shortcuts are established in a bottom-up fashion.

3.3 Handling Subscriber Failures

We now consider the case that subscribers are allowed to crash without warning. In this case the address ceases to exist. Consequently, even though nodes may still send messages to , these messages do not invoke any action on . Note that we do not consider supervisor failure, since it is assumed to be hard-coded. The challenge here is to restore the system to a correct supervised skip ring that does not contain , i.e., we need to exclude from the system. In pure peer-to-peer systems this scenario is a problem, since we have to maintain failure detectors [3] at each node in order to be able to determine if some neighboring node has crashed. This leads to an increased overhead in the complete system. However, in our setting it suffices to establish only one single failure detector at the supervisor, because we only need to make sure that the will eventually contain the correct data. Consequently, if the supervisor notices that subscriber has crashed, it just has to remove from its . By periodically executing the actions for restoring a corrupted we know that the will eventually contain the correct data.

4 Self-Stabilizing Publish-Subscribe System

In this section we show how to use our BuildSR protocol as a self-stabilizing publish-subscribe system. We start by discussing some general modifications and then describe the operations subscribe, unsubscribe and publish.

Let be the set of available topics that one may subscribe to. To construct a publish-subscribe system out of our self-stabilizing supervised overlay network, we basically run a BuildSR protocol for each topic at the supervisor. Thus, the supervisor has to extend its to be in . From here on, we assume that each message contains the topic it refers to, such that the receiver of such a message can match it to the respective BuildSR protocol. Once a subscriber wants to subscribe to some topic , it starts running a new BuildSR protocol for topic . Upon unsubscribing, the subscriber may remove the respective BuildSR protocol, once it gets the permission from the supervisor, implying that the supervisor has removed the subscriber from its . By assigning the topic number to each message that is sent out, we can identify the appropriate protocol at the receiver. For convenience, we still consider only one supervised skip ring for the rest of the paper.

4.1 Subscribe/Unsubscribe

When processing a operation, the supervisor executes the following actions (denote by the number of nodes in the before the subscribe/unsubscribe request):

  1. Insert into the .

  2. Send its correct configuration .

The correctness of subscribe follows immediately, since our protocol is self-stabilizing. Note that the supervisor can easily extract the tuples and from the , since all tuples are sorted based on the value of their labels. Our approach has the advantage that it spreads multiple sequential subscribe operations through the skip ring, meaning that a pre-existing subscriber is involved (i.e., it has to change its configuration) only for two consecutive subscribe operations. Afterwards its configuration remains untouched until the number of subscribers has doubled. This is due to the definition of the label function . As an example consider the skip ring from Figure 1 and assume that there are new subscribers that want to join. Then these new subscribers are inserted in between consecutive pairs of old subscribers on the ring, as they receive (real-valued) labels , , .

When processing an operation, the supervisor executes the following actions:

  1. Remove from the .

  2. Get the tuple with from the and replace with ’s label in the .

  3. Send its new configuration .

  4. Inform that it is granted permission to delete all its connections to other subscribers.

After both subscribers have received their correct label from the supervisor, the ring will stabilize itself. Note that the supervisor’s is already in a legitimate state after the initial (resp. ) message has been processed by the supervisor. Therefore, the supervisor does not rely on additional information from subscribers to stabilize its . The following lemma states the correctness of unsubscribe.

After a subscriber has sent an request to the supervisor, eventually gets disconnected from the graph induced by .

Proof.

Assume that subscriber sends an request to the supervisor. By definition of the unsubscribe protocol, the supervisor removes from its and sends its configuration that is . Hence, sets and answers all incoming introduction messages from other subscribers with the request to delete the connection to . By definition of BuildSR, every subscriber that has an edge will eventually introduce itself to , leading to eventually getting disconnected from the graph induced by . This proves the lemma. ∎

It follows from the above descriptions that the supervisor only has to send out a constant number of messages per subscribe/unsubscribe request:

In a legitimate state, the message overhead of the supervisor and subscribers is constant for subscribe/unsubscribe operations.

4.2 Publish

In the following paragraphs we extend our protocol to be able to provide publish operations in a self-stabilizing manner. Note that the presented approach is used only to realize a self-stabilizing publication-dissemination-approach. There exist dedicated protocols (e.g. flooding, see Section 4.3) that realize a more efficient distribution of publications among the subscribers. A self-stabilizing protocol for publications is able to correct eventual mistakes that occurred in the flooding approach. For storing publications at each subscriber, we use an extended version of a Patricia trie [16] to effectively determine missing publications at subscribers. We first define the Patricia trie and later on present a protocol that is able to merge all publications in all Patricia tries. This results in each subscriber storing all publications.

A trie is a search tree with node set over the alphabet . Every edge is associated with a label . Additionally, every key that has been inserted into the trie can be reached from the root of the trie by following the unique path of length whose concatenated edge labels result in .

A Patricia trie is a compressed trie in which all chains (i.e., maximal sequences of nodes of degree 1) are merged into a single edge whose label is equal to the concatenation of the labels of the merged trie edges. We store a Patricia trie at each subscriber , denoted by . Each leaf node in a Patricia trie stores a publication , where is the alphabet for publications. Note that each inner node of a Patricia trie has exactly 2 child nodes denoted by . Furthermore, we want to assign a label to each node: The label of an inner node is defined as the longest common prefix of the labels of ’s child nodes (with being the empty word). If is a leaf node storing a publication , we define ’s label to be the unique key generated by the collision-resistant hash function , where a pair contains the unique ID of the subscriber that generated the publication . Note that the constant and the hash function are known to all subscribers, ensuring that every label for a publication has the same length.

In addition to node labels, we let nodes store (unique) hash values: We use another collision-resistant hash function and define the hash value of a leaf node as . If is an inner node, then is defined as the hash of the concatenation of the hashes of ’s child nodes, i.e., . This approach is similar to a Merkle-Hash Tree (MHT) [15], which also hashes data using a collision-resistant hash function and building a tree on these hashes. However, our approach does not require one-way hash functions, which is a standard assumption in MHTs, because we do not require our scheme to be cryptographically secure.

If a subscriber wants to publish a message over the ring, just inserts into its own Patricia trie. The publication is then spread among all subscribers of the ring by the following protocol, executed at each subscriber : Subscriber periodically sends a request CheckTrie(, ) to one of its ring neighbors (chosen randomly) containing itself and the root node of ’s Patricia trie. Note that sending an arbitrary node along a message CheckTrie means that we only store and in the request while ignoring ’s outgoing edges. Upon receiving a request CheckTrie(, ) with , a subscriber does the following: It searches for the node with label and checks if . The following three cases may happen:

  • : Then we know that the set of publications stored in the subtrie of with root node are the same as the set of publications stored in the subtrie of with root node . Subscriber does not send any response to in this case.

  • : Then the contents of the subtries with roots differ in at least one publication. In order to detect the exact location, where both Patricia tries differ, responds to by sending a request CheckTrie(, , ) to , which is handled by as two separate CheckTrie requests CheckTrie(, ) and CheckTrie(, ).

  • does not exist in : Then contains publications that do not exist in . Subscriber is able to compute the label prefix of those missing publications: First, searches for the node with label prefix and minimal, i.e., with and minimal. If such a node exists, then may contain at least all publications with label prefix . Furthermore, knows that all publications with label prefix are missing in its Patricia trie. As a consequence, requests to continue checking the subtrie with root node of label and to deliver all publications with label prefix to . It does so by sending a CheckAndPublish(, , ) request to , where internally calls CheckTrie(, ) and, in addition, delivers all publications with label prefix to . In case that a node as described above cannot be found in , just requests to deliver all publications with prefix to , since that subtrie is missing in .

With this approach, only those publications are sent out that are assumed to be missing at the receiver.

As an example consider two subscribers with Patricia tries as shown in Figure 2. Note that is missing in . We describe how will eventually receive .

Figure 2: Example Patricia tries and for two subscribers .

First assume that sends out a CheckTrie(, ) message to in its Timeout method with being the root node of ’s Patricia trie. Subscriber then compares the hash with the hash of its root node, which is not equal. Thus, sends a message CheckTrie(, ) to , which forces to compare the hashes the nodes with labels resp. to the hashes resp. . Both comparisons result in the hashes being equal, which ends the chain of messages at subscriber .

Now assume, that sends out a CheckTrie(, ) message to in its Timeout method with being the root node of ’s Patricia trie. Subscriber compares the with and spots a difference. Thus, it sends a message CheckTrie(, ) to . For the node with label this results in both hashes being equal, but then cannot find a node with label in its Patricia trie, which is why sends a message CheckAndPublish(, , ) to . Note that the node with label is the node with label of minimum length for which is a prefix. Thus, . The CheckAndPublish request forces to compare the hashes of its node with label to the given hash , which results in both hashes being equal. Furthermore, sends all publications with labels of prefix to , which is only the publication . After has inserted in its Patricia trie, both tries are equal, resulting in two equal root hashes.

The example shows that it is important at which subscriber the initial CheckTrie request is started.

4.3 Flooding

As an extension to the above approach, we can make use of shortcuts to spread new publications over the ring: Whenever a subscriber generates a new publication , inserts into and broadcasts over the ring, by sending a PublishNew() message to all of its neighbors with . Upon receiving such a PublishNew() message, a subscriber checks if is already stored in . If not, then inserts into and continues to broadcast by forwarding the PublishNew message to its neighbors. In case that is already stored in , just drops the message. By applying this flooding approach on top of the self-stabilizing publish protocol, we can achieve faster delivery of new publications in practice (recall that the skip ring has a diameter of ). Still, if a new subscriber joins a topic, it has to rely on the core BuildSR protocol to receive all publications. Furthermore, note that we do not rely on flooding to show convergence of publications.

5 Analysis

In this section we show that BuildSR is self-stabilizing according to Definition 1.1. We also show that eventually all subscribers are storing all publications in their respective Patricia tries. The combination of the first two theorems yields that BuildSR is self-stabilizing:

[Network Convergence] Given any initially weakly connected graph with nodes, BuildSR transforms into a skip ring .

Proof.

First of all, note that eventually all corrupted messages are received. Furthermore, a corrupted message cannot trigger an infinite chain of corrupted messages, i.e., eventually the false information is either corrected or received but not spread anymore. We assume this fact for the rest of the proof.

We start with the supervisor and prove the following lemma:

[Supervisor Validity] Eventually the supervisor’s will not be corrupted anymore.

Proof.

We show for every condition that may occur in a corrupted that the condition does not occur anymore at some point in time.

  • Assume that there exists a tuple with . Then this tuple is simply deleted from the database (see Algorithm 3, line  41).

  • Assume that there are entries , and . At some point in time the supervisor wants to send the configuration to (Algorithm 3, line 5), which leads to the supervisor calling CheckMultipleCopies(). Therefore, the supervisor is able to detect the entries , . W.l.o.g. assume that is detected before in the for-loop of CheckMultipleCopies. Then the supervisor just removes the redundant tuple from the , which resolves this condition.

  • Assume there exists , such that for all it holds . In this case, the supervisor is able to detect this corruptness: It asks for the entry with label in the (Algorithm 3, line 43). Then the supervisor proceeds as follows: It replaces the label of the entry with maximum with the label (see Algorithm 3, line  45). Note that has to exist in this case, since otherwise , which is a contradiction.

  • Finally, assume that there exists , s.t. there is a tuple with . Then there exists a number , for which there is no tuple , so eventually gets its label changed to .

In the end we showed that all 4 conditions that may occur in a corrupted do not occur in the from some point in time on, which concludes the proof. ∎

Using this lemma, we are ready to prove the convergence of the supervisor:

[Supervisor Convergence] After the supervisor’s has reached a non-corrupted state, all subscribers will eventually become recorded.

Proof.

Recall that we call a subscriber recorded, if it is stored in the supervisors’s . Note that the supervisor does not remove subscribers from its (non-corrupted) unless it gets the request to do so, which is not the case as we do not consider unsubscribe operations here.

Let be a non-recorded subscriber. Then either or . Recall the actions described in Section 3.2.1. If , then requests its configuration from the supervisor via action (i) and becomes recorded.

Now let . If does not have connections to other subscribers (i.e., ), then requests its configuration from the supervisor via action (iv), as thinks that it is the subscriber with minimal label, so becomes recorded.

It remains to consider the general case, where we are given a connected component of subscribers. Action (iv) guarantees that quickly contains at least one subscriber that is recorded. As long as the supervisor is able to introduce new recorded subscribers to recorded subscribers in , ’s size grows, but since the number of subscribers is finite, will eventually become static. We show that for such a static connected component , eventually all subscribers will become recorded: Consider the potential function

We show that eventually, . Following the same argumentation from above, it is easy to see that is never increasing. Let for an arbitrary constant . Then all subscribers in eventually form a sorted ring due to BuildRing. This implies that there exists a ring edge from a subscriber that is already contained in the supervisor’s to a subscriber that is not yet contained in the supervisor’s . W.l.o.g. let . Since is not know to the supervisor, the has to contain a different right neighbor for . Let be this neighbor, i.e., as the supervisor sends its correct configuration, it tells that should be its right ring neighbor. But then it has to hold . Note that cannot hold, since this contradicts the fact that subscribers in have already arranged themselves in a sorted ring and no new subscriber can be introduced to a recorded subscriber in , as is already static. This implies that upon receiving its configuration, triggers action (iii) and requests the configuration for at the supervisor, reducing by one and finishing the proof. ∎

Having the supervisor’s converged, we know that the ring of subscribers eventually converges:

[Ring Convergence] After each subscriber has stored its correct configuration from the supervisor the ring induced by edges has converged.

Proof.

The supervisor periodically sends the correct configuration to each subscriber in a round-robin fashion (Algorithm 3, line 5). This implies that after calls of the supervisor’s Timeout procedure, each subscriber has stored its correct label. Note that this does not necessarily include the correct ring neighbors: A subscriber may have received its configuration from the supervisor, but the subscriber stored in (resp. ) may not yet. This may result in modifying via BuildRing, because may not have received its correct label. Since at least all labels are correct now, each subscriber receives its configuration from the supervisor the second time and does not change its list neighbors anymore. ∎

Finally, we need to prove the convergence of the shortcuts for all subscribers:

[Shortcut Convergence] Assume that each subscriber has its correct configuration stored. Then all correct shortcut links will eventually be established at some point in time for all subscribers .

Proof.

Using Lemma 5, we assume that the sorted ring induced by edges is already correctly built. We perform an induction over the levels of shortcuts and show that all shortcuts on each level are eventually established. The induction base () trivially holds, as shortcuts on level are ring edges in . For the induction hypothesis, assume that all shortcuts on level have already been established, i.e., all nodes in already form a sorted ring (recall Definition 2.1). In the induction step we show that all shortcuts on level are eventually established. It is easy to see that holds. Denote the sorted ring over nodes in as . Observer that each node has two neighbors in with . Thus, by definition of our protocol, eventually introduces to and vice versa when calling its Timeout method. This implies that the shortcuts and are established. The above argumentation implies that the ring is established eventually, which concludes the induction. ∎

Having shown the convergence of the supervisor (Lemma 5), the sorted ring for all subscribers (Lemma 5) and the convergence of the shortcuts for all subscribers (Lemma 5), we have proved the convergence of the overall system (Theorem 5). ∎

[Network Closure] If the explicit edges in already form a supervised skip ring , then they are preserved at any point in time if no subscribers join or leave the system.

Proof.

We need to show closure for the supervisor’s as well as for the skip ring. Again, we start at the supervisor:

[Supervisor Closure] If the explicit edges in already form a supervised skip ring, then the supervisor’s does not get modified anymore, if no subscriber joins or leaves the system.

Proof.

The supervisor’s is only modified, if Subscribe requests for new subscribers arrive at or a subscriber unsubscribes by sending an Unsubscribe request to . Both scenarios are forbidden by assumption of the lemma. ∎

[Ring Closure] If the explicit edges in already form a supervised skip ring, then the set does not get modified anymore, if no subscriber joins or leaves the system.

Proof.

Messages that are generated by the extended BuildRing protocol do not modify the edge set , since closure of the extended BuildRing protocol (Lemma 2.2) holds. Observe that introduction messages for shortcuts do not modify the variables and for a subscriber . Implicit edges generated by configurations sent out by the supervisor are just merged with the existing explicit edges at the receiving subscriber , since already stores the correct configuration. ∎

[Shortcut Closure] If the explicit edges in already form a supervised skip ring, then the set does not get modified anymore, if no subscriber joins or leaves the system.

Proof.

Note that shortcuts are only modified in IntroduceShortcut or (Algorithm 4, line 22). IntroduceShortcut is only called to introduce a node to some shortcut , which already exists, since no node generates an introduction message for two nodes that are not allowed to be connected by a shortcut, as one can easily see via induction. ∎

By combining Lemmas 55 and 5, we obtain Theorem 5. ∎

Furthermore, we can show that the delivery of publications is done in a self-stabilizing manner.

[Publication Convergence] Consider an initially weakly connected graph and assume that there are publications in the system, stored at arbitrary subscribers . Then eventually all subscribers store a Patricia trie consisting of all publications in .

Proof.

First note that in our protocol, no publish messages are deleted from the Patricia tries, i.e., once a subscriber has a publication stored in its Patricia trie , it will never remove from . Therefore, we apply Theorem 5 and assume that the explicit edges in already form a supervised skip ring. For a subscriber let be the set of publication stored in the leaf nodes of . We define the potential of a pair of subscribers by