Population protocols with unreliable communication

02/26/2019 ∙ by Mikhail, et al. ∙ Technische Universität München 0

Population protocols are a model of distributed computation intended for the study of networks of independent computing agents with dynamic communication structure. Each agent has a finite number of states, and communication opportunities occur nondeterministically, allowing the agents involved to change their states based on each other's states. Multiple variations of that model have been studied. In most of them the situation of temporary impossibility of communication between some agents is natural. On the other hand, the models usually assume atomic interactions, i.e. either all the agents update their state or none do. In practice, ensuring that in case of a communication problem an interaction is recognised as successful either by all participants or by nobody has performance and implementation complexity costs. In the present paper we study unreliable models based on population protocols and their variations from the point of view of expressive power. We model the effects of non-atomic interaction. We show that for a general definition of unreliable protocols with constant-storage agents such protocols can only compute predicates computable by immediate observation population protocols. Immediate observation population protocols are inherently tolerant of unreliable communication and keep their expressive power under a wide range of fairness conditions. We prove it via a structural lemma that can also be applied for other settings requiring guaranteed eventual correctness. We also prove that adding unreliability reduces expressive power non-monotonically, and show that a large class of message-based models becomes strictly less expressive than immediate observation.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Population protocols have been introduced in [1, 2] as a restricted yet useful subclass of general distributed protocols. In such protocols each agent has a constant amount of local storage, and during the protocol execution arbitrary pairs of agents are selected and permitted to interact. The selection of pairs is assumed to be done by an adversary bound by a fairness condition. The fairness condition forces the adversary to permit some protocol progress. A typical fairness condition requires that every configuration that stays reachable during an infinite execution is reached infinitely many times.

Population protocols have since been studied from various points of view, such as expressive power [4], verification complexity [14], time before convergence [3, 12], privacy [9], impact of different interaction scheduling [6] etc. Multiple related models have been introduced. Some of them change or restrict the communication structure: this is the case for immediate, delayed, and queued transmission and observation [4], as well as for broadcast protocols [13]. Some explored the implications of adding limited amounts of storage (below the usual linear or polynomial storage permitted in traditional distributed protocols): this is the case for community protocols [16] (which allow an agent to recognise a constant number of other agents), PALOMA [7] (permitting logarithmic amount of local storage), mediated population protocols [17] (giving some constant amount of common storage to every pair of agents), and others.

One of the main target applications of population protocols and related models is modelling networks of restricted sensors. This is the motivation in the title of the original paper [1] on population protocols. Of course, in case of a very large number of powerful agents it is also beneficial to have a limited model where complicated failure modes can be explored in advance and avoided. If a model is more restricted, the behaviours are easier to understand intuitively and to describe within a formal proof. Both applications motivate study of fault tolerance. Many papers on population protocols and related models include a section on fault tolerance, but usually the fault is expected to be either a total agent failure or a byzantine failure. There are even papers such as [8] completely dedicated to total and byzantine agent failures. In practice, there is a different sort of fault that is more frequent than byzantine agents and sometimes more damaging than an agent simply ceasing operation — transaction atomicity violation. Specifically, an agent may think that a single interaction has completed successfully while the other party thinks the interaction has failed and keeps the old state. This problem usually arises because of the message loss, as an agent cannot find out if lack of confirmation is caused by the original message having been lost in transit or just the confirmation having been lost.

In [10, 11], Di Luna et al. study for many fine-grained classes of multiagent systems with unreliable binary interactions whether they can simulate a given population protocol. In our work we study a different problem, the predicates computable by different classes of protocols. Further, we include broadcast-based and message-based models while restricting agent memory and the types of communication failure, as well as excluding leader-based protocols.

In the present paper we study the expressive power of various models with interacting constant-storage agents when unreliability is introduced. We show an upper bound on the expressive power of unreliable protocols that holds under general assumptions. We show that this upper bound is reached by immediate observation population protocols, a model that inherently tolerates unreliability and is considered a relatively weak model in the fully reliable case. This model also has other nice properties, such as relatively low complexity (-complete) of verification tasks [15] and providing multiple ways for a safe implementation in terms of practical hardware.

The rest of the present paper is organised as follows. First we define a general protocol framework and represent various known models inside this framework. Then we summarise the results from the literature on the expressive power of such models. Afterwards we formally define our general notion of unreliable protocol. Then we formulate and prove the common limitation of all the unreliable protocols. This allows us to conclude the proof of the main result in the later section. Afterwards we show that fully asynchronous (message-based) models, such as queued transmission, become strictly less powerful than immediate observation in the unreliable setting. The paper ends with a brief conclusion and some possible future directions.

2 Preliminaries

We consider the different models where the number of agents is constant during protocol execution, each agent has a constant amount of local storage, and interactions with the agents in the same state are indistinguishable for other agents.

We use a representation where agents can be distinguished and tracked individually for the purposes of analysis, even though they cannot identify each other during the execution of the protocol.

A protocol is specified by a tuple , with components being a finite nonempty set of (individual agent) states, a finite (possibly empty) set of messages, a finite nonempty input alphabet , a finite nonempty output alphabet (which can be assumed to be without loss of generality), an input mapping function , an individual output function , a step relation (which is described in more details below), and a fairness condition on executions.

The protocol defines evolution of populations of agents (possibly with some message packets being present).

A population is a pair of sets: of agents and of packets. A configuration is a population together with two functions, provides agent states, and provides packet contents. Note that if is empty, then must also be empty. As the set of agents is the domain of the function , we use the notation for it. The same goes for the set of packets .

To define the conditions on step relations of protocols, we start by introducing some notation.

For a function and let denote the function defined on such that and . For let denote the function defined on such that and . For symmetry, if let denote restriction .

Use of this notation implies an assertion of correctness, i.e. , , and . We use the same notation with a configuration instead of a function if it is clear from context whether or is modified.

Now we can describe the step relation that tells us which configurations can be obtained from a given one via a single interaction.

The step relation of a protocol is a set of triples , called steps, where and are configurations and is the set of active agents (of the step); agents in , are called passive. We write for , and let denote the projection of that ignores . The step relation must satisfy the following conditions for every step :

  • Agent conservation. .

  • Packet immutability. For every : .

  • Agent and packet anonymity. If and are bijections such that , , , and , then .

  • Possibility to ignore extra packets. For every and : .

  • Possibility to add passive agents. For every agent and there exists such that: .

  • Irrelevance of state of passive agents. For every passive agent and there exists such that:

Informally speaking, the active agents are the agents that transmit something during the interaction. Note that the choice of active agents for each step will not be taken into account until the definition of unreliable protocols.

An execution is a sequence (finite or infinite)

of configurations such that at each moment

either nothing changes, i.e. or a single interaction occurs, i.e. . A configuration is reachable from configuration if there exists an execution with and .

A protocol defines a fairness condition which is a predicate on executions. It should satisfy the following properties.

  • A fair condition is eventual, i.e. every correct finite execution can be continued to an infinite fair execution.

  • A fair condition ensures activity, i.e. if an execution contains only configuration after some moment, there is no configuration such that . (Note that a two-configuration cycle is still allowed).

The default fairness condition accepts an execution if every configuration either becomes unreachable after some moment, or occurs infinitely many times.

It is clear that the default fairness condition ensures activity. Let us remind the proof that it is eventual.

[adapted from [4]] Default fairness condition is eventual.


Consider a configuration after a finite execution. If the set of messages is nonempty, fix a countable set of potential packets. Then there is a countable set of possible configurations. Consider an arbitrary enumeration of configurations that mentions each configuration infinitely many times.

We repeat the following procedure: skip unreachable configurations in the enumeration, then perform the steps necessary to reach the next reachable one. If we skip a configuration, it can never become reachable again. Therefore all the configurations that stay reachable infinitely long are never skipped and therefore they are reached infinitely many times. ∎

It is easy to show that for a protocol without messages the default fairness condition provides executions similar to random ones.

A configuration is recurrent in an execution if it occurs infinitely many times.

Assume there are no messages in the protocol, for every configuration every permitted step has a positive probability independent of the previous execution, and the default fairness condition is used.

Then a random infinite execution is fair with probability . On the other hand, for every fair run its set of recurrent configurations has a positive probability to be the set of recurrent configurations of a random run.


Note that during the execution the set of the agents stays the same, so there is a finite number of potentially reachable configurations and there is an upper bound on the number of steps to reach any reachable configuration. Therefore there is a positive global lower bound on the probability at any moment to reach any reachable configuration no later than after steps. If a configuration stays reachable infinitely often, with probability it is reached infinitely many times.

On the other hand, in a fair execution all recurrent configurations can reach each other and no other configurations are reachable from the recurrent ones. There is a positive probability of reaching one of the recurrent configurations in a random execution . If that happens, with conditional probability the execution will be fair, and have the same set of recurrent configurations as . ∎

Assume that for every configuration every permitted step has a positive probability independent from the previous history. Then the condition of activity has probability , and every condition with probability is eventual.


If an execution gets stuck at configuration despite having a permitted step, there is a fixed positive probability at every future moment to take that step, so infinitely repeating the configuration  has probability .

Every permitted finite execution has probability equal to a product of positive probabilities corresponding to the individual steps. This product is positive. Therefore a condition with probability cannot be mutually exclusive with observing in the beginning. ∎

An input configuration is a configuration where there are no packets and all agents are in input states, i.e. and . We extend to be applicable to multisets of input symbols. For every , we define to be a configuration of agents with agents in input state (and no packets).

A configuration is a consensus if the individual output function yields the same value for the states of all agents, i.e. or just . This value is the output value for the configuration. A consensus is a stable consensus if all configurations reachable from are consensus configurations with the same output value.

A protocol implements a function if for every every fair execution starting from reaches a stable consensus with the output value . A protocol is well-specified if it implements some function.

Now let us define the previously studied models to show that our framework has sufficient generality.

Population protocol is described by an interaction relation . The set of messages is empty. A configuration can be obtained from , if there are agents and states such that , , , and . The set of active agents is .

Sometimes the fairness condition is formulated only in terms of step relation instead of reachability, but the equivalence is obvious by induction. Fully anonymous descriptions of population protocols require the same about multisets of states, which is an equivalent condition because the number of agents is finite and constant.

Immediate transmission population protocol is a population protocol such that depends only on , i.e. the following two conditions hold. If and then . If then for every exists such that .

Immediate observation population protocol is an immediate transmission population protocol such that every possible interaction has .

We can consider only the first agent to be active.

Queued transmission protocol has a nonempty set of messages. It has two transition relations: describing sending the messages, and describing receiving the messages. If agent has state and , it can send a message as a fresh packet and switch to state : . If agent has state , packet contains message and , agent can receive the message: .

Delayed transmission protocol is a queued transmission protocol where every message can always be received by every agent, i.e. the projection of to is the entire .

Delayed observation protocol is a delayed transmission protocol where sending a message doesn’t change state, i.e. implies .

Note that as the number of messages can be arbitrarily large, the fairness condition formulated in terms of reachability via a single step is not equivalent to the default fairness condition.

Broadcast protocol is defined by two relations: describing a sender transition, and . To perform a step from a configuration , we pick an agent with state and change its state to such that . At the same time, we simultaneously update the state of all other agents, in such a way that an agent in state can switch to any state such that .

We consider the transmitting agent to be the only active one.

In the literature, the relations , , and are sometimes required to be partial functions. As we use relations in the general case, we use relations here for consistency.

3 Expressive power of population protocols and related models

In this section we give an overview of previously known results on expressive power of various models related to population protocols. We only consider predicates, i.e. functions with the output values being and because the statements of the theorems become more straightforward in that model.

[see [4] for details] Population protocols and queued transmission protocols can implement precisely semilinear predicates.

Immediate transmission population protocols and delayed transmission protocols can implement precisely all the semilinear predicates that are also in . Roughly speaking, is the class of predicates that become equivalent to modular equality for inputs with only large and zero components.

Immediate observation population protocols can implement precisely all counting predicates. Counting predicates are logical combinations of inequalities including one coordinate and one constant each.

Delayed observation protocol can implement precisely the counting predicates where every constant is equal to .

[see [5] for details] Broadcast protocols implement precisely the predicates computable in nondeterministic linear space.

4 Our models

4.1 Proposed models

We propose a general notion of an unreliable version of a protocol.

Unreliable protocol, corresponding to a protocol , is a protocol that differs from only in the step relation. For every allowed step we also allow all the steps where satisfies the following conditions.

  • Population preservation. , .

  • State preservation. For every agent : .

  • Message preservation. For every packet : .

  • Reliance on active agents. For every agent if then .

The first three conditions formalise the idea that is just where some agents failed to update their state. The last condition says that for some passive agent to receive a transmission, the transmissions have to occur (and transmitting parties reliably notice that).

Although it is possible to invent a model of interaction where reliance on active agents doesn’t fully capture the desired semantics, the implications match the intuition for the models previously studied in the literature. Also note that we never fail to send or consume messages. We err on the side of a stronger requirement on , because a weaker restriction reduces the class of unreliable protocols and we make our claims in a more general case.

Unreliable immediate observation population protocols do not differ from ordinary immediate observation population protocols, because each step changes the state of only one agent. Failing to change the state means performing a no-change step which is already allowed anyway.

Unreliable population protocols allow an interaction to update the state of only one of the two agents.

Unreliable immediate transmission population protocols allow the sender to update the state with no receiving agents.

Unreliable queued transmission protocols allow messages to be discarded with no effect. Note that for unreliable delayed observation it doesn’t change much, as sending the messages also has no effect.

Unreliable broadcast protocols allow a broadcast to be received by an arbitrary subset of agents.

Note that there are natural weaker notions of fairness for unreliable protocols. We require that every configuration that is always reachable occurs infinitely often; we could restrict the condition and only require occurrences of configurations reachable by successful interactions.

Our results hold also for non-default fairness conditions.

4.2 The main result

Our main result is that no class of unreliable protocols can be more expressive than immediate observation protocols. We also observer that the expressive power of immediate observation protocols does not depend on reliability or on specific fairness condition.

A cube is a subset of defined by a lower and upper (possibly infinite) bound for each coordinate.

A counting set is a finite union of cubes.

A counting function is a function such that the complete preimage of each possible value is a counting set.

The set of functions implementable by any of the types of unreliable protocols is at most the set of counting functions.

5 The shadow agent lemma

In this section we present our core structural lemma for the models under consideration.

Let be an arbitrary execution of protocol  with initial configuration . Let be an agent in this execution. Let be an agent, and . A set of executions starting in configuration is a shadow extension of the execution  around the agent  if the following conditions hold.

  • formally removing from each configuration in any execution from yields ;

  • at every step there is an execution in such that and are in the same state.

The added agent is a shadow agent, and elements of are shadow executions. A protocol  is shadow-permitting if for every configuration  there is a fair execution starting from that has a shadow extension around each agent .

Note that the executions in might not be fair even if is fair.

All unreliable protocols are shadow-permitting.

Proof of lemma 5.

We construct an execution and the families in parallel, then show that the resulting execution is fair. Our process ensures that in each the set of states assigned to in at least one of the executions in can only grow.

We build the execution using existence of fair continuations. We pick a fair continuation and insert its steps one by one. Under some conditions we choose to insert different steps into and restart the procedure. Only a finite number of restarts can occur, so eventually we obtain a fair execution.

On each step we examine the step that the execution should make according to the fair continuation procedure. If for every there is already an execution where the last configuration satisfies , we add as the next configuration of . To each shadow execution ending with we add configuration . holds because of possibility to add a passive agent and to not update its state because of unreliability.

If some reaches a new state, there are two cases. Either all agents reaching states not already reached by their shadows are passive, or some of them are active.

If all such agents are passive, we only update states of active agents. Namely, we create a modified step with if and otherwise, . Note that because of unreliability. We add as the next configuration in . Like before, to every execution in a shadow extension ending with we add . We also add some new executions to the shadow extensions. Consider a passive agent that would reach a state not yet reached by in any execution in . We pick the execution in the family (before the addition of the -derived configurations) with the last configuration such that . We construct . , because we can add to the step via possibility to add passive agents, then swap and using agent anonymity and equality of state in so that gets the state , then apply unreliability to make sure that all the passive agents except keep their old states. We add the configuration to , and add the resulting execution to .

If some of the agents reaching new states are active, we block the updates of all agents via unreliability (instead of only passive ones) and add this trivial step to , then update the shadow extensions like in the previous case, but blocking all updates (possibly except ) instead of only the updates of passive agents.

By construction, we always add such steps to that removing yields the corresponding step from . We also ensure that no agent can reach a state not reached by its shadow in one of the executions in the shadow extension. Every time we do not copy the next configuration from a fair execution, there is some such that the number of states that agent can reach in at least one execution increases. Therefore we can do it at most times. Afterwards we will just copy the entire fair continuation to , and obtain a fair execution.

This concludes the proof of the lemma. ∎

6 Proof of the main result

6.1 Truncation lemma

In this section we briefly remind (and extend) the truncation lemma from [4]. The idea of the truncation lemma is that large amounts of agents are indistinguishable for the notion of stable consensus.

A protocol is truncatable if there exists a number such that a stable consensus cannot be destroyed by adding an extra agent with a state already represented by at least other agents.

Every protocol is truncatable.


Every configuration can be summarised by an element of . All configurations corresponding to the same multiset of states are simultaneously consensus or not, and simultaneously stable consensus or not. The set of elements of not corresponding to a stable consensus is upwards closed, because reaching a state with a different local output value cannot be impeded by adding agents. Indeed, if we can reach a configuration with some state present, we can always use addition of passive agents to each step of the path and still have a path of valid steps from a larger configuration to some configuration with state still present. By Dickson’s lemma, the set of non-stable-consensus state multisets has a finite set of minimal elements . We can take larger than all coordinates of all minimal elements. Then adding more agents with the state that already has at least agents leads to increasing a component larger than in the multiset of states. This cannot change any component-wise comparisons with multisets from , and therefore belonging to and being or not a stable consensus. ∎

6.2 Expressive power of immediate observation population protocols

All counting predicates can be implemented by (unreliable) immediate observation protocols, even if the fairness condition is replaced with an arbitrary different (activity-ensuring) one.


We have already observed that immediate observation population protocols do not change if we add unreliability. It was shown in [4] that immediate observation population protocols implement all counting predicates. Moreover, the protocol provided there for threshold functions has the state of each agent increase monotonically. It is easy to see that ensuring activity is enough for this protocol to converge to a state where no more configuration-changing steps can be taken. Also, the construction for boolean combination of predicates via direct product of protocols used in [4] converges as long as the protocols for the two arguments converge. Therefore it doesn’t need any extra restrictions on the fairness condition. ∎

6.3 End of the proof

An -ary function implemented by a shadow-permitting truncatable protocol is a counting function.


Let be the truncation constant. We claim that distinguishes only numbers in each input position. This naturally defines cubes where is constant.

More specifically, we prove an equivalent statement: adding to an argument already larger than doesn’t change the output value of . Indeed, consider any corresponding input configuration. We can build a fair execution starting in it with shadow extensions around each agent. As the function is correctly implemented, this fair execution has to reach a stable consensus. By assumption (and pigeonhole principle), more than agents from the chosen input state end up in the same state. By definition of shadow extension, there is an execution starting with one more agent in the chosen input state, and reaching the same stable consensus but with one more agent in the state with more than other ones (which doesn’t break the stable consensus). Continuing this finite execution to a fair execution we see that the value of must be the same. This concludes the proof. ∎

The theorem 4.2 now follows from the fact that all the unreliable protocols are shadow-permitting and truncatable.

7 Expressive power of unreliable message-based models

In this section we observe that while delayed transmission protocols and queued transmission protocols are more powerful than immediate observation population protocols, their unreliable versions are strictly less expressive than (unreliable) immediate observation population protocols. We prove an even more general statement: a well-specified unreliable protocol with a single input state where each step depends on the state of only a single agent (but possibly also an arbitrary number of packets) cannot distinguish a one-agent configuration from a two-agent configuration with both agents sharing the input state.

A protocol is fully asynchronous if for each allowed step the following conditions hold.

  • There is exactly one active agent, i.e. .

  • No passive agents change their states.

  • Either the packets are only sent or the packets are only consumed, i.e. or .

In the messaged-based models considered in literature, it happens to be impossible to consume a packet and send a different one during the same step. We keep that property to avoid modelling a transmission failure as a separate type of failure (it is enough that a receiver may consume some packets without updating the state).

A well-specified unreliable fully asynchronous protocol with a single-letter input alphabet yields the same value for the input tuples and .

The core idea of the proof is to ensure that in a reachable situation rare messages do not exist and cannot be created. In other words, if there is a packet with some message, there are many packets with the same message. This makes irrelevant the production of new messages by agents, and the number of agents present.

In-degree of a fully asynchronous protocol is the maximum number of messages consumed in a single step.

Supply of a message in configuration is the number of packets in with the message , i.e. .

Let . An abundance set is the largest set such that the supply of each message in is at least where is the in-degree. As decreases in the last argument, the abundance set is well-defined. A message is abundant in configuration if it is in the abundance set, i.e. . A message is expendable at some moment in execution if it is abundant in some configuration that has occurred in before that moment. A packet is expendable if it bears an expendable message.

An execution is careful if no step that decreases the supply of non-expendable messages changes agent states.

Every unreliable fully asynchronous protocol with a single-letter input alphabet has a careful fair execution starting from the configuration with a single agent in the only input state.


We start with an execution with only the initial configuration.

In the first part of the construction, the algorithm asks if it is possible to create a new packet with a non-expendable message (without making execution careless). It is clear that consuming non-expendable message is never necessary for that, because the consumption step cannot change the state by definition of carefulness, and cannot create new packets by definition of fully asynchronous protocol. If we can create a new non-expendable packet, we do it. Note that this cannot take more than steps, as all the expendable packets are already available for consumption, and there is no reason to repeat the same internal state twice. Therefore creating an additional non-expendable packet can consume at most packets. Repeating this times consumes at most expendable packets and create a new message with supply at least . As , all the expendable messages together with this message form an abundance set.

At some point it becomes impossible to grow the abundance set, and therefore at some point it also becomes impossible to create new non-expendable packets. Then we start asking if it is possible to consume a non-expendable packet. We visit all the reachable states (the steps consume at most expendable packets) performing all possible steps that consume non-expendable packets. While consuming non-expendable packets, we fail to update the agent state using unreliability. Note that consuming a non-expendable packet requires consuming at most expendable packets. As the supply of each non-expendable message is less than , we consume at most . Therefore we still have more than packets with each expendable message left, no non-expendable packets that can be received in a reachable state and no possibility to create a non-expendable packet.

We can now spend at most expendable packets to reach a stable consensus. We call that moment the target moment. Afterwards we just take an arbitrary fair continuation. ∎

Now we can prove the theorem.

Proof of theorem 7.

Consider a fair careful execution for the single-agent configuration. Because we can always add passive agents and messages in any protocol, and because in a fully asynchronous protocol the passive agents won’t change their states, we can run two copies of it as a single two-agent execution.

Consider the moment when both agents have reached the target moment. Any state reachable for the pair of agents is reachable for the single agent, because we have a sufficient supply of all messages that are still can be received. Therefore we can do a necessary number of state changes (at most ) without needing to create new packets.

Therefore it is possible to have the same computation result with two agents and with a single one. By definition of well-specification that means that the function value is the same, which concludes the proof. ∎

Note that if we prohibit an agent from receiving its own messages, repeating the same construction with insignificant changes shows that populations of and agents cannot be distinguished.

This result doesn’t mean that fundamentally asynchronous nature of communication prevents us from using any expressive models for verification of unreliable systems. In practice it is usually possible to keep enough state to implement, for example, unreliable immediate observation via request and response.

8 Conclusion and future directions

We have studied unreliability based on atomicity violations, a new, practically motivated approach to fault tolerance in population protocols. We have shown that inside a general framework of defining unreliable protocols we can prove a specific structural property that bounds the expressive power of unreliable protocols by the expressive power of immediate observation population protocols. Immediate observation population protocols permit verification of many useful properties, up to well-specification, correctness and reachability between counting sets, in polynomial space. We think that relatively low complexity of verification together with inherent unreliability tolerance and locally optimal expressive power under atomicity violations motivate further study and use of such protocols.

It is also interesting to explore if for any class of protocols adding unreliability makes some of the verification tasks easier. Both complexity and expressive power implications of unreliability can be studied for models with larger state, such as community protocols, PALOMA and mediated population protocols. We also believe that some models even more restricted than community protocols but still permitting a multi-interaction conversation are an interesting object of study both from classical point of view and from point of view of unreliability.