We consider population protocols  for exact-majority voting. The underlying computation system consists of a population of anonymous (i.e. identical) agents, or nodes, and a scheduler which keeps selecting pairs of nodes for interaction. A population protocol specifies how two nodes update their states when they interact. The computation is a (perpetual) sequence of interactions between pairs of nodes. The objective is for the whole system to eventually stabilize in configurations which have the output property defined by the considered problem. In the general case, the nodes can be connected according to a specified graph and two nodes can interact only if they are joined by an edge. Following the scenario considered in most previous work on population protocols, we assume the complete communication graph and the random uniform scheduler. That is, each pair of (distinct) nodes has equal probability to be selected for interaction in any step and each selection is independent of the previous interactions.
The model of population protocols was proposed in Angluin et al.  and has subsequently been extensively studied to establish its computational power and to design efficient solutions for fundamental tasks in distributed computing such as various types of consensus-reaching voting. The survey from Aspnes and Ruppert  includes examples of population protocols, early computational results and variants of the model. The main design objectives for population protocols are small number of states and fast stabilization time. The original definition of the model assumes that the agents are copies of the same finite-state automaton, so the number of states (per node) is constant. This requirement has been later relaxed by allowing the number of states to increase (slowly) with the population size, to study trade-offs between the memory requirements and the run times.
The (two-opinion) exact-majority voting is one of the basic settings of consensus voting [3, 4, 5]. Initially each node is in one of two distinct states and , which represent two distinct opinions (or votes) and , with nodes holding opinion (starting in the state ) and nodes holding opinion . We assume that and denote the initial imbalance between the two opinions by . The desired output property is that all nodes have the opinion of the initial majority. An exact majority protocol should guarantee that the correct answer is reached, even if the difference between and is only (cf. ). In contrast, approximate majority would require correct answer only if the initial imbalance is sufficiently large. In this paper, when we refer to “majority” (protocol, or voting, or problem) we always mean the exact-majority notion.
We will now give further formalization of a population protocol and its time complexity. Let denote the set of states, which can grow with the size of the population (but keeping it low remains one of our objectives). Let denote the state of a node at step (that is, after individual interactions). Two interacting nodes change their states according to a common deterministic transition function . A population protocol has also an output function , which is used to specify the desired output property of the computation. For majority voting, , which means that a node in a state assumes that is the majority opinion. The system is in an (output) correct configuration at a step , if for each , is the initial majority opinion. We consider undirected individual communications, that is, the two interacting nodes are not designated as initiator and responder, so the transition functions must be symmetric. Thus if , then , implying, for example, that .
We say that the system is in a stable configuration, if no node will ever again change its output. The computation continues (since it is perpetual) and nodes may continue updating their states, but if a node changes from a state to another state , then the output is the same as . Thus a majority protocol is in a correct stable configuration if all nodes output the correct majority opinion and will do so in all possible subsequent configurations. Two main types of output guarantee categorize population protocols as either always correct, if they reach the correct stable configuration with probability , or w.h.p. correct. A protocol of the latter type reaches a correct stable configuration w.h.p.,111A property , e.g. that a given protocol reaches a stable correct configuration, holds w.h.p. (with high probability), if it holds with probability at least , where constant can be made arbitrarily large by changing the constant parameters in (e.g. the constant parameters of a protocol). allowing that with some low but positive probability an incorrect stable configuration is reached or the computation does not stabilize at all.
The notion of the time complexity of population protocols which has been used recently to derive lower bounds on the number of states [1, 2], and the notion which we use also in this paper, is the stabilization time defined as the first round when the system enters a correct stable configuration.222Some previous papers (e.g. [1, 10]) refer to this stabilization time as the convergence time. We follow the common convention of defining the parallel time as the number of interactions divided by . Equivalently, we group the interactions in rounds of length , called also (parallel) steps, and take the number of rounds as the measure of time. In our analysis we also use the term period, which we define as a sequence of consecutive interactions, but not necessarily aligned with rounds.
The main result of this paper is a majority protocol with stabilization time w.h.p. and in expectation, for some constant (here specifically ), while using logarithmically many states. According to  this number of states is asymptotically optimal for protocols with , and to the best of our knowledge this is the first result that offers stabilization in time with poly-pogarithmic state space.
1.1 Previous work on population protocols for the majority problem
Draief and Vojnović  and Mertzios et al.  analyzed two similar four-state majority protocols. Both protocols are based on the idea that the two opinions have weak versions and in addition to the main strong versions and . The strong opinions are viewed as tokens moving around the graph. Initially each node has a strong opinion or , and during the computation it has always one of the opinions , , or (so is in one of these four states). The strong opinions have dual purpose. Firstly, two interacting opposite strong opinions cancel each other and change into weak opinions. Such pairwise canceling ensures that the difference between the number of strong opinions and does not change throughout the computation (remaining equal to ) and eventually all strong opinions of the initial minority are canceled out. Secondly, the surviving strong opinions keep moving around the graph, converting the weak opposite opinions.
Mertzios et al.  call their protocol the 4-state ambassador protocol (the strong opinions are ambassadors) and prove the expected stabilization time for any graph and for the complete graph. Draief and Vojnović  call their 4-state protocol the binary interval consensus, viewing it as a special case of the interval consensus protocol of Bénézit et al. , and analyze it in the continuous-time model. For the uniform edge rates (the continuous setting which is roughly equivalent to our setting of one random interaction per one time unit) they show that the expected stabilization time for the complete graph is at most . They also derive bounds on the expected stabilization time for cycles, stars and Erdős-Rényi graphs.
The appealing aspect of the four-state majority protocols is their simplicity and the constant-size local memory, but the downside is polynomially slow stabilization if the initial imbalance is small. The stabilization time decreases if the initial imbalance increases, so the performance would be improved if there was a way of boosting the initial imbalance. Alistarh et al.  achieved such boosting by multiplying all initial strong opinions by the integer parameter . The nodes keep the count of the number of strong opinions they currently hold. When eventually all strong opinions of the initial minority are canceled, strong opinions of the initial majority remain in the system. This speeds up both the canceling of strong opinions and the converting of weak opinions of the initial minority, but the price is the increased number of states. Refining this idea, Alistarh et al.  obtained a majority protocol which has the stabilization time w.h.p. and in expectation and uses states.
A suite of polylogarithmic-time population protocols for various functions, including the exact majority, was proposed by Angluin et al. . Their protocols are w.h.p. correct and, more significantly, require a unique leader to synchronize the progress of the computation. Their majority protocol w.h.p. reaches a correct stable configuration within time (with the remaining low probability, it either needs more time to reach the correct output or it stabilizes with an incorrect output) and requires only a constant number of states, but the presence of the leader node is crucial.
The protocols developed in  introduced the idea of alternating cancellations and duplications, which has been frequently used in subsequent majority protocols and forms also the basis of our new protocol. This idea has the following interpretation within the framework of canceling strong opinions. The canceling stops when it is guaranteed that w.h.p. the number of remaining strong opinions is less than , for some small constant . Now the remaining strong opinions duplicate: if a node with a strong opinion interacts with a node which does not hold a strong opinion, then both nodes get the same strong opinion. This duplicating stops when it is guaranteed, again w.h.p., that all initial strong opinions have been duplicated. One phase of (partial) cancellations followed by (complete) duplications takes w.h.p. time, and repetitions of this phase increases the difference between the numbers of strong opinions and to . With such large imbalance between strong opinions, w.h.p. within additional time the minority opinion is completely eliminated and the majority opinion is propagated to all nodes.
Bilke et al.  showed that the cancellation-duplication framework from  can be implemented without a leader if the agents have enough states to count their interactions. They obtained a majority protocol which has stabilization time w.h.p. and in expectation, and uses states. Berenbrink et al.  considered population protocols for the plurality voting, which generalizes the majority voting to opinions. Using the methodology introduced earlier for load balancing , they generalized the previous results on majority protocols by working with multiple opinions and arbitrary graphs, showing also only time w.h.p. for the case of complete graphs and . Their protocol, however, requires a polynomial number of states and initial advantage of the most common opinion to achieve time. Recently Alistarh et al.  have shown that any majority protocol which has expected stabilization time of , where is any positive constant, and satisfies technical conditions of monotonicity and output dominance, requires states. They have also presented a protocol which uses only states and has stabilization time w.h.p. and in expectation.
The lower and upper bounds shown in Alistarh et al.  raised the following questions. Can exact majority be computed in poly-logarithmic time with states, if the time complexity is measured in some other natural and relevant way than the time till (correct) stabilization? Can exact majority be computed in time with poly-logarithmic states? (The protocol in  and all earlier exact majority protocols which use poly-logarithmic number of states have time complexity at least of the order of .) For a random (infinite) sequence of interaction pairs, let denote the convergence time, defined as the first round when (at some interaction during this round) the system enters a correct configuration (all nodes correctly output the majority opinion) and remains in correct configurations in all subsequent interactions (of this sequence ). Clearly , since reaching a correct stable configuration implies that whatever the future interactions may be, the system will always remain in correct configurations.
Very recently Kosowski and Uznański  and Berenbrink et al.  have shown that the convergence time can be poly-logarithmic while using states.In  the authors design a programming framework and accompanying compilation schemes that provide a simple way of achieving protocols (including majority) which are w.h.p. correct, use states and converge in expected poly-logarithmic time. They can make their protocols always-correct at the expense of having to use states per node, while keeping poly-logarithmic time, or increasing time to , while keeping a constant bound on the number of states. In  the authors design an always-correct majority protocol which converges w.h.p. in time and uses states and an always-correct majority protocol which stabilizes w.h.p. in time and uses states, where parameter .
The recent population protocols for majority voting often use similar technical tools (mainly the same efficient constructions of phase clocks) as protocols for another fundamental problem of leader election. There are, however, notable differences in computational difficulty of both problems, so advances in one problem do not readily imply progress with the other problem. For example, leader election admits always-correct protocols with poly-logarithmically fast stabilization and only states (the lower bound here is only ). There are some general ideas, recently explored in , which indicate that in leader election expected run times of order significantly better than can be achieved (though the w.h.p. time would remain ). Those ideas, however, are specific for leader election and not applicable to majority voting.
1.2 Our contributions
We present a majority population protocol with stabilization time w.h.p. and in expectation, using asymptotically optimal states. This is the first state-space optimal protocol for majority with stabilization time . In fact, to the best of our knowledge, there is no other majority protocol with states and time , even for the weaker notions of the convergence time or w.h.p.-correctness.
All known fast majority population protocols using a polylogarithmic number of states are based in some way on the idea of a sequence of canceling-duplicating (or canceling-doubling) phases, each of length (first introduced in ), synchronizing the nodes across phase boundaries. In our new protocol we still use the canceling-doubling framework (as explained in Section 2) but with shorter phases of length each, at the expense of loosing the synchronization. We note that all existing protocols known to us working within the canceling-doubling framework cease to function properly with such short phases. Not only can we no longer guarantee a synchronized transition across phase boundaries (and in order to obtain the correct answer one must not allow opposite opinions from different phases to cancel each other), but we do not even have the guarantee that every node will be activated at all during a phase (in fact, we know some will not). The existing protocols require each node to be activated at least once (actually at least logarithmically many times) during each phase. Our main technical contributions are mechanisms to deal with nodes advancing too slowly or too quickly through the short phases, that is, nodes which are not in sync with the bulk. We believe that some algorithmic and analytical ideas used for this may be of independent interest.
2 Exact majority: the general idea of canceling-doubling phases, and a protocol with time and states
We view the votes as tokens, which can have different ages and values (magnitudes). Initially each node has one token of type or , with age and value . Throughout the computation, each node either has one token or is empty. In the following we say that two tokens meet if their corresponding nodes interact.
When two opposite tokens (one and the other ) of the same value meet, then they cancel each other and the nodes become empty. Such an interaction is called canceling.
When a token of type and age interacts with an empty node, then this token splits into two tokens, each of type , age and half the value, and each of the two involved nodes takes one token. We refer to such splitting of a token also as duplicating or doubling.
Thus the age of a token is equal to the number of times it has undergone splitting; its value is equal to . Note that any sequence of canceling and splitting interactions preserves the difference between the sum of the values of all and tokens. This difference is always equal to the initial imbalance. The primary objective is to eliminate all minority tokens. When only majority tokens are left in the system, the majority opinion can be propagated to all nodes w.h.p. within additional interactions via a broadcast process. The final standard process of propagating the outcome will be omitted from our descriptions and analysis. That is, from now on we assume that the objective is to eliminate the minority tokens.
We first, in this section, describe the -step -state Majority protocol presented in . Then we propose two new protocols, both with a runtime of steps: FastMajority1 with states (described and analyzed in Sections 3 and 4) and FastMajority2 with states (outlined in Section 5). Further details of our protocols, including pseudocodes and detailed proofs, are given in Appendix.
The structure of the -step Majority protocol will provide a useful reference in explanations of the computation and the analysis of the faster protocols. From the node’s local point of view, the computation of the Majority protocol consists of at most phases and each phase consists of at most interactions, where is a suitably large constant. Each node keeps track of the number of phases and steps (interactions) within the current phase, and maintains further information which indicates the progress of computation. More precisely, each node keeps the following data, which require states.
– the type of token held by . If then the node is empty.
– the counter of phases.
– the counter of steps in the current phase.
Boolean flags, which are initially false and indicate the following status when set to true:
– has a token which has already doubled in the current phase;
– the node has made the decision on the final output;
– the protocol has failed because of some inconsistencies.
If a node is in neither of the two special states done and fail, then we say that is in a normal state: A node is in Phase if . If is in Phase and is not empty, then the age of the token at is either if not (the token has not doubled yet in this phase) or if . Thus the phase of a token (the phase of the node where the token is) and the flag doubled indicate the age of this token. Throughout the whole computation, the pair can be regarded as the (combined) interaction counter of node . This counter is incremented by at the end of each interaction. Thus, for example, if is equal to after such an increment, then node has just completed a phase.
Each phase is divided into five parts defined below, where is a constant discussed later.
The second part is the canceling stage and the fourth part is the doubling stage, each consisting of steps. If two interacting nodes are in the canceling stage of the same phase and have opposite tokens, then the tokens cancel out. If two interacting nodes are in the doubling stage of the same phase, one of them has a token which has not doubled yet in this phase and the other is empty, then this is a doubling interaction.
The beginning, the middle and the final parts of a phase are buffer zones, consisting of steps each. The purpose of these parts is to ensure that the nodes progress through the current phase in a synchronized way.
If nodes were simply incrementing their step counters by at each interaction, then those counters would start diverging too much for the canceling-doubling process to work correctly. An important aspect of the Majority protocol, as well as our new faster protocols, is the following mechanism for keeping the nodes sufficiently synchronized. When two interacting nodes are in different phases, then the node in the lower phase jumps up to (that is, sets its step counter to) the beginning of the next phase. The Majority protocol relies on this synchronization mechanism in the high probability case when all nodes are in two adjacent parts of a phase (that is, either in two consecutive parts of the same phase, or in the final part of one phase and the beginning part of the next phase.) In this case the process of pulling all nodes up to the next phase follows the pattern of broadcast. The node, or nodes, which have reached the beginning of the next phase by way of normal one-step increments broadcast the message “if you are not yet in the same phase as I am, then jump up to the next phase.” By the time the broadcast is completed (that is, by the time when the message has reached all nodes), all nodes are together in the next phase. It can be shown that there is a constant such that w.h.p. the broadcast completes in random pairwise interactions (see, for example ; other papers may refer to this process as epidemic spreading or rumor spreading).
The constant in the definition of the parts of a phase is suitably smaller than the constant , but sufficiently large to guarantee the following two conditions: (a) the broadcast from a given node to all other nodes completes w.h.p. within interactions; and (b) for a sequence of consecutive interactions, w.h.p. for each node and each , the number of times is selected for interaction within the first interactions differs from the expectation (which is equal to ) by at most . Condition (a) is used when the nodes reaching the end of the current phase initiate broadcast to “pull up” the nodes lagging behind. Condition (a) implies that after interactions, w.h.p. all nodes are in the next phase. Using Condition (b) with , we can also claim that w.h.p. at this point all nodes are within the first steps of the next phase (all nodes are in the next phase and no node interacted more than the expected plus times). Finally Condition (b) applied to all implies that w.h.p. the differences between the individual counts of node interactions do not diverge by more than throughout this phase. We set and take large enough so that (to have at least steps in the canceling and doubling stages) and both Conditions (a) and (b) hold. This way we achieve the following synchronized progress of nodes through a phase: w.h.p. all nodes are in the same part of the same phase before they start moving on to the next part. Moreover, also w.h.p., for each canceling or doubling stage there is a sequence of consecutive interactions when all nodes remain in this stage and each of them is involved in at least interactions.
Thus throughout the computation of the Majority protocol, w.h.p. all nodes are in two adjacent parts of a phase. In particular, w.h.p. the canceling and doubling activities of the nodes are separated. This separation ensures that the cancellation of tokens creates a sufficient number of empty nodes to accommodate new tokens generated by token splitting in the subsequent doubling stage. If two interacting nodes are not in the same or adjacent parts of a phase (a low, but positive, probability), then their local times (step counters) are considered inconsistent and both nodes enter the special fail state. The details of the Majority protocol are given in pseudocode in Algorithms 1 and 2.
From a global point of view, w.h.p. each new phase starts with all nodes in normal states in the beginning of this phase. We say that this phase completes successfully if all nodes are in normal states in the beginning part of the next phase . At this point all tokens have the same value , and the difference between the numbers of opposite tokens is equal to . The computation w.h.p. keeps successfully completing consecutive phases, each phase halving the value of tokens and doubling the difference between tokens and tokens, until the critical phase , which is the first phase when the difference between the numbers of opposite tokens is
The significance of the critical phase is that the large difference between the numbers of opposite tokens means that w.h.p. all minority tokens will be eliminated in this phase, if they have not been eliminated yet in previous phases. More specifically, at the end of phase , w.h.p. only tokens of the majority opinion are left and each of these tokens has value either , if the token has split in this phase, or , otherwise. If at least one token has value , then this token has failed to doubled during this phase and assumes that the computation has completed. Such a node enters the done state and broadcasts its (majority) opinion to all other nodes. In this case phase is the final phase.
If at the end of the critical phase all tokens have value , then no node knows yet that all minority tokens have been eliminated, so the computation proceeds to the next phase . Phase will be the final phase, since it will start with more than tokens and all of them of the same type, so at least one token will fail to double and will assume that the computation has completed and will enter the done state. The condition that a token has failed to double is taken as indication that w.h.p. all tokens of opposite type have been eliminated. Some tokens may still double in the final phase and enter the next phase (receiving later the message that the computation has completed) but w.h.p. no node reaches the end of phase . Thus the done state is reached w.h.p. within parallel time.
The computation may fail w.l.p. 333w.l.p. – with low probability – means that the opposite event happens w.h.p. when the step counters of two interacting nodes are not consistent, or a node reaches phase , or two nodes enter the done state with opposite type tokens. Whenever a node realizes that any of these low probability events has occurred, it enters the fail state and broadcasts this state to all other nodes. (The standard broadcast of done and fail states is not included in the pseudocodes.)
It is shown in  that the Majority protocol stabilizes, either in the correct all-done configuration or in the all-fail configuration, within time w.h.p. and in expectation. The standard technique of combining a fast protocol, which w.l.p. may fail, with a slow but always-correct backup protocol gives an extended Majority protocol, which requires states per node and computes the exact majority within time w.h.p. and in expectation. For the slow always-correct protocol take the four-state majority protocol, run both the fast and the slow protocols in parallel and make the nodes in the fail state adopt the outcome of the slow protocol. The slow protocol runs in expected polynomial time, say in time, but its outcome is used only with low probability of , so it contributes only to the overall expected time.
We omit the details of using a slow backup protocol (see, for example, [2, 10]), and assume that the objective of a canceling-doubling protocol is to use a small number of states , to compute the majority quickly w.h.p., say within a time bound , and to have also low expected time of reaching the correct all-done configuration or the all-fail configuration, say within a bound . If the bounds and are of the same order , then we get a corollary that the majority can be computed with states in time w.h.p. and in expectation.
3 Exact majority in time with states
To improve on the time of the Majority protocol, we shorten the length of a phase to , where . The new FastMajority1 protocol runs in time and requires states per node. We will show in Section 5 that the number of states can be reduced to asymptotically optimal . We keep the term in the description and the analysis of our fast majority protocols to simplify notation and to make it easier to trace where a larger value of would break the proofs.
Phases of sub-logarithmic length are too short to ensure that w.h.p. all tokens progress through the phases synchronously and keep up with required canceling and doubling, as they did in the Majority protocol. In the FastMajority1 protocol, we have a small but w.h.p. positive number of out-of-sync tokens, which move to the next phase either too early or too late (with respect to the expectation) or simply do not succeed with splitting within a short phase. Such tokens stop contributing to the regular dynamics of canceling and doubling. The general idea of our solution is to group consecutive phases (a total of steps) into an epoch, to attach further
steps at the end of each epoch to enable the out-of-sync tokens to reach the age required at the end of this epoch, and to synchronize all nodes by the broadcast process at the boundaries of epochs. When analyzing the progress of tokens through the phases of the same epoch, we consider separately the tokens which remain synchronized and the out-of-sync tokens.
We now proceed to the details of the FastMajority1 protocol. Each epoch consists of steps, where is a suitably large constant, and is divided into two equal-length parts. The first part is a sequence of canceling-doubling phases, each of length . The purpose of the second part is to give sufficient time to out-of-sync tokens so that w.h.p. they all complete all splitting required for this epoch. Each node maintains the following data, which can be stored using states. For simplicity of notation, we assume that expressions like and have integer values if they refer to an index (or a number) of phases or steps.
– type of token held by .
- the counter of epochs.
– the age of the token at (if has a token) with respect to the beginning of the current epoch. If is or , then the age of this token is and the value of this token is .
– each epoch consists of two parts, each part has steps. The first part, when , is divided into canceling-doubling phases.
– counter of phases in the first part of the current epoch.
– counter of steps (interactions) in the current phase.
Boolean flags indicating the status of the node, all set initially to false:
, , – as in the Majority protocol;
– has a token which no longer follows the expected progress through the phases of the current epoch;
– the computation is in the additional epoch of phases, with each of these phases consisting now of steps.
We say that a node is in epoch if , and in phase (of the current epoch) if . We view the triplet as the (combined) counter of steps in the current epoch, and the pair as the counter of the steps of the whole protocol. If a node is not in any of the special states or fail, then we say that is in a normal state:
A normal token is a token in a normal node. Each phase is split evenly into the canceling stage (the first steps of the phase) and the doubling stage (the remaining steps).
The vast majority of the tokens are normal tokens progressing through the phases of the current epoch in a synchronized fashion. These tokens are at the same time in the beginning part of the same phase and have the same age (w.r.t. the end of the epoch). They first try to cancel out with tokens of the same age but opposite type during the canceling stage, and if they survive, then they split during the subsequent doubling stage. At some later time most of the tokens will still be normal, but in the beginning part of the next phase and having age . Thus the age of a normal token (w.r.t. the beginning of the current epoch) is equal to its phase, if the token has not split yet in this phase, or to its phase plus , if the token has split (this is recorded by setting the flag doubled).
As in the Majority protocol, we separate the canceling and the doubling activities to ensure that the canceling of tokens creates first a sufficient number of empty nodes to accommodate the new tokens obtained later from splitting. Unlike in the Majority protocol, the FastMajority1 protocol does not have the buffer zones within a phase. Such zones would not be helpful in the context of shorter sublogarithmic phases when anyway we cannot guarantee that w.h.p. all nodes progress through a phase in a synchronized way.
A token which has failed to split in one of the phases of the current epoch becomes an out-of-sync token (the out_of_sync flag is set). Such a token no longer follows the regular canceling-doubling phases of the epoch, but instead tries cascading splitting to break up into tokens of age (relative to the beginning of the epoch) as expected by the end of this epoch. An out-of-sync token does not attempt canceling out because there would be only relatively few opposite tokens of the same value, so small chance to meet them (too small to make a difference in the analysis). The tokens obtained from splitting out-of-sync tokens inherit the out-of-sync status. A token drops the out-of-sync status if it is in the second part of the epoch and has reached the age . (Alternatively, out-of-sync tokens could switch back to the normal status as soon as their age coincides again with their phase, but this would complicate the analysis.) An out-of-sync node is a node with an out-of-sync token. While each normal node and token is in a specific phase of the first part of an epoch or is in the second part of an epoch, the out-of-sync nodes (tokens) belong to an epoch but not to any specific phase. The objective for a normal token is to split into two halves in each phase of the current epoch (if it survives canceling). The objective of an out-of-sync token is to keep splitting in the current epoch (disregarding the boundaries of phases) until it breaks into tokens expected at the end of this epoch.
We show in our analysis that w.h.p. there are only out-of-sync tokens in one epoch. W.h.p. all out-of-sync tokens in the current epoch reach the age (w.r.t. the beginning of the epoch) by the mid point of the second part of the epoch (that is, by the step of the epoch), for each epoch before the final epoch . In the final epoch at least one out-of-sync token completes the epoch without reaching the required age.
When the system completes the final epoch, the task of determining the majority opinion is not fully achieved yet. In contrast to the Majority protocol where on the completion of the final phase w.h.p. only majority tokens are left, in the FastMajority1 protocol there may still be a small number of minority tokens at the end of the final epoch, so some further work is needed. A node which has failed to reach the required age by the end of the current epoch, identifying that way that this is the final epoch, enters the additional_epoch state and propagates this state through the system to trigger an additional epoch of phases. More precisely, the additional epoch consists of at most phases corresponding to epochs (if ), and , and each phase has now steps. W.h.p. these phases include the critical phase and the phase , defined by (1). The computation of the additional epoch is as in the Majority protocol, taking time to reach the correct all-done configuration w.h.p. or the all-fail configuration w.l.p.
Two interacting nodes first check the consistency of their time counters (the counters of interactions) and switch to fail states, if the difference between the counters is greater than . If the counters are consistent but the nodes are in different epochs (so one in the end of an epoch, while the other in the beginning of the next epoch), then the node in the lower epoch jumps up to the beginning of the next epoch. This is the synchronization mechanism at the boundaries of epochs, analogous to the synchronization by broadcast at the boundaries of phases in the Majority protocol. In the FastMajority1 protocol, however, it is not possible to synchronize the nodes at the boundaries of (short) phases.
For details of the FastMajority1 protocol, see the pseudocodes given in Algorithms 3–6 in the Appendix. The pseudocodes do not include the details of the additional epoch, since this final part of the computation follows closely the Majority protocol. To enable the initialization of the additional epoch, the nodes keep track of the tokens they have had at the end of the two previous epochs. (All nodes have to know their tokens from the beginning of epoch , but there may be nodes which have already progressed to epoch when they are notified that epoch is the final one.) The additional epoch does not need additional states since it can (re-)use the existing states.
4 Analysis of the FastMajority1 protocol
Ideally, we would like that w.h.p. all tokens progress through the phases of the current epoch in a synchronized way, that is, all tokens are roughly in the same part of the same phase, as in the Majority protocol. This would mean that w.h.p. at some (global) time all nodes are in the beginning part of the same phase, ensuring that all tokens have the same value , and at some later point all nodes are in the end part of this phase and all surviving tokens have value . This ideal behavior is achieved by the Majority protocol at the cost of having -step phases. As discussed in Section 2, the logarithmic length of a phase gives also sufficient time to synchronize w.h.p. the local times of all nodes at the end of a phase so that they all end up together in the beginning part of the next phase.
Now, with phases having only steps, we face the following two difficulties in the analysis. Firstly, while a good number of tokens split during such a shorter phase, w.h.p. there are also some tokens which do not split. Secondly, phases of length are too short to keep the local times of the nodes synchronized. We can show again that a good number of nodes proceed in synchronized manner, but w.h.p. there are nodes falling behind or rushing ahead and our analysis has to account for them.
Counting the phases across the epochs, we define the critical phase as in (1). Similarly as in the -time Majority protocol, the computation proceeds through the phases moving from epoch to epoch until the critical phase . Then the computation gets stuck on this phase or on the next phase . Some tokens do not split in that final phase or in any subsequent phase of the current epoch because there are not enough empty nodes to accommodate new tokens. Almost all minority tokens have been eliminated, so the creation of empty nodes by cancellations of opposite tokens has all but stopped. This is the final epoch and the nodes which do not split to the value required at the end of this epoch trigger the additional epoch of phases, each having steps. The additional epoch is needed because we do not have a high-probability guarantee that all minority tokens are eliminated by the end of the final epoch. The small number of remaining minority tokens may have various values which are inconsistent with the values of the majority tokens, so further cancellations of tokens might not be possible. The additional epoch includes the phases of the three consecutive epochs and to ensure that w.h.p. both phases and are included. Phase can be as early as the last phase in epoch and phase can be as late as the first phase in epoch .
The following conditions describe the regular configuration of the whole system at the beginning of epoch , and the corresponding Lemma 4 summarizes the progress of the computation through this epoch. Recall that the FastMajority1 protocol is parameterized by a suitably large constant and our analysis refers also to another smaller constant . We refer to the first (resp. the last) steps of a phase or a stage as the beginning (resp. the end) part of this phase or stage. The (global) time steps count the number of interactions of the whole system.
At least nodes are in normal states, are in epoch , and their epoch_step counters are at most .
For each remaining node ,
is in a normal state in epoch and (that is, is in the last quarter of epoch ), or
is in a normal or out-of-sync state in epoch and .
Consider an arbitrary epoch such that phase belongs to an epoch and assume that at some (global) step the condition holds.
If phase does not belong to epoch (that is, phase is in a later epoch ), then w.h.p. there is a step when the condition holds.
If both phases and belong to epoch , then w.h.p. there is a step when
a nodes is completing epoch and enters the additional_epoch state (because it has a token which has not split to the value required at the end of this epoch); and
all other nodes are in normal or out-of-sync states in the second part of epoch or the first part of epoch .
Otherwise, that is, if phase is the last phase in epoch (and is the first phase in epoch ), then w.h.p. either there is a step when the above condition for the end of epoch holds, or all nodes eventually progress to epoch and there is a step when the condition analogous to but for the end of epoch holds.
The condition given below describes the regular configuration of the whole system at the beginning of phase in epoch . We note that the last phase in an epoch is phase and the condition refers in fact to the beginning of the second part of the epoch. A normal token in the beginning of phase in epoch has (absolute) value and relative values , , and w.r.t. (the beginning of) this phase, the end of this phase, the beginning of this epoch and the end of this epoch, respectively. It may also be helpful to recall that for a given node , phase starts at ’s epoch step . Observe that implies .
The set of nodes which are normal and in the beginning part of phase in epoch has size at least . That is, a node is in if, and only if, is true, , , and either and , if , or and , if .
Let denote the set of the remaining nodes.
For each :
is a normal node in epoch , and ; or is in a normal or out-of-sync state in epoch and .
The total value of the tokens in w.r.t. the end of epoch is at most .
For an epoch and a phase in this epoch, let denote the global index of this phase. We show that w.h.p. the condition holds at the beginning of each phase .
For arbitrary and such that , assume that the condition holds at some (global) time step and the condition holds at the step . Then the following conditions hold, where .
If , then w.h.p. at step the condition holds.
If , then w.h.p. at step the total value, w.r.t. the end of epoch , of the minority-opinion tokens is .
Lemma 4 is proven by analyzing the cancellations and duplications of tokens in one phase. Lemma 4 is proven by applying inductively Lemma 4. In turn, Theorem 4 below, which states the bound on the completion time of the FastMajority1 protocol, can be proven by applying inductively Lemma 4.
The FastMajority1 protocol uses states, computes the majority w.h.p. within time and reaches the correct all-done configuration or the all-fail configuration within the expected time.
The majority can be computed with states in time w.h.p. and in expectation.
We give now some further explanations of the structure of our analysis, referring the reader to Appendix for the formal proofs. Lemma 4 and Claim 4 show the synchronization of the nodes which we rely on in our analysis. Lemma 4 is used in the proof of Lemma 4, where we analyze the progress of the computation through one epoch consisting of interactions ( parallel steps). Lemma 4 can be easily proven using first Chernoff bounds for a single node and then the union bound over all nodes. The proof of Claim 4 is considerably more involved, but we need this claim in the proof of Lemma 4, where we look at the finer scale of individual phases and have to consider intervals of interactions of a given node. This claim shows, in essence, that most of the nodes stay tightly synchronized when they move from phase to phase through one epoch. The epoch_step counters of these nodes stay in a range of size at most .
For each sufficiently large constant and for , during a sequence of interactions, 444 – with probability at least , where grows to infinity with increasing . the number of interactions of each node is within from the expectation of interactions.
For a fixed , assume that holds at a time step . Let be the set of nodes which satisfy at this step the condition 1 of (that is, is the set of nodes which are in epoch with epoch_step counters at most ). Then at an arbitrary but fixed time step , w.h.p. all nodes in are in epoch and all but of them have their epoch_step counters within from .
Lemmas 4 and 4 describe the performance of the broadcast process in the population-protocol model. Lemma 4 has been used before and is proven, for example, in . Lemma 4 is a more detailed view at the dynamics of the broadcast process, which we need it in the context of Lemma 4 to show that the synchronization at the end epoch gives w.h.p. .
For each sufficiently large constant , the broadcast completes within interactions.
Let be any constant in and let and be sufficiently large constants. Consider the broadcast process and let be the first step when nodes are already informed and . Then the following conditions hold.
With probability at least , nodes receive the message for the first time within the consecutive interactions .
, and no node interacts more than times within interval .
With probability at least , there are nodes which interact within interval at least times but not more than times.
5 Reducing the number of states to
Our FastMajority1 protocol described in Section 3 requires states per node. Using the idea underlying the constructions of leaderless phase clocks in  and , we now modify FastMajority1 into the protocol FastMajority2, which still works in time but has only the asymptotically optimal states per node.555Note that using the phase clock from  would not result in fewer states being needed for our protocol. The general idea is to separate from the whole population a subset of clock nodes, whose only functionality is to keep the time for the whole system. The other nodes work on computing the desired output and check whether they should progress to the next stage of the computation when they interact with clock nodes. We note that while we use similar general structure and terminology as in , the meaning of some terms and the dynamics of our phase clock are somewhat different. A notable difference is that in  the clock nodes keep their time counters synchronized on the basis of the power of two choices in load balancing: when two nodes meet, only the lower counter is incremented. In contrast, we keep the updates of time counters as in the Majority and FastMajority1 protocols: both interacting clock nodes increment their time counters, with the exception that the slower node is pulled up to the next -length phase or epoch, if the faster node is already there.
The nodes in the FastMajority2 protocol are partitioned into two sets with nodes in each set. One set consists of worker nodes, which may carry opinion tokens and work through canceling-doubling phases to establish the majority opinion. These nodes maintain only information whether they carry any token, and if so, then the value of the token (equivalently, the age of the token, that is, the number of times this token has been split). Each worker node has also a constant number of flags which indicate the current activities of the node (for example, whether it is in the canceling stage of a phase), but it does not maintain a detailed step counter. The other set consists of clock nodes, which maintain their detailed epoch-step counters, counting interactions with other clock nodes modulo , for a suitably large constant , and synchronizing with other clocks by the broadcast mechanism at the end of epoch. Thus the clock nodes update their counters in the same way as all nodes would update their counters in the FastMajority1 protocol, so Lemma 4 applies with some obvious adaptation (the number of all nodes changes to the number of clock nodes and only interactions between clock nodes are counted).
The worker nodes interact with each other in a similar way as in FastMajority1, but now to progress orderly through the computation, they rely on the relatively tight synchronization of clock nodes. A worker node advances to the next part of the current phase (or to the next phase, or the next epoch), when it interacts with a clock node whose clock indicates that should progress. There is also the third type of nodes, the terminator nodes, which will appear later in the computation. A worker or clock node becomes a terminator node when it enters a done or fail state. The meaning and function of these special states are as in protocols Majority and FastMajority1. In the Appendix we show how to convert a majority input instance into the required initial workers-clocks configuration.
Referring to the state space of the FastMajority1 protocol, in the FastMajority2 protocol each worker node maintains data fields , and to carry information about tokens and their ages, and a constant number of flags to keep track of the status of the node and its progress through the current epoch and the current phase. These include the status flags from the FastMajority1 protocol , and , and flags indicating the progress: the flag from FastMajority1 and a new (multi-valued) flag . The clock nodes maintain the epoch_step counters. The nodes have constant number of further flags, for example to support the initialization to workers and clocks and the implementation of the additional epoch and the slow backup protocol. Thus in total each node has only states.
Further details of FastMajority2, including pseudocodes and outline of the proof of Theorem 5 which summarizes the performance of this protocol, are given in the Appendix.
The FastMajority2 protocol uses states, computes the exact majority w.h.p. within parallel time and stabilizes (in the correct all-done configuration or in the all-fail configuration) within the expected parallel time.
The exact majority can be computed with states in parallel time w.h.p. and in expectation.
-  Dan Alistarh, James Aspnes, David Eisenstat, Rati Gelashvili, and Ronald L. Rivest. Time-space trade-offs in population protocols. In Philip N. Klein, editor, Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017, Barcelona, Spain, Hotel Porta Fira, January 16-19, pages 2560–2579. SIAM, 2017. URL: http://dx.doi.org/10.1137/1.9781611974782.169, doi:10.1137/1.9781611974782.169.
-  Dan Alistarh, James Aspnes, and Rati Gelashvili. Space-optimal majority in population protocols. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2018, New Orleans, LA, USA, January 7-10, 2018, pages 2221–2239, 2018. URL: https://doi.org/10.1137/1.9781611975031.144, doi:10.1137/1.9781611975031.144.
-  Dan Alistarh, Rati Gelashvili, and Milan Vojnovic. Fast and exact majority in population protocols. In Chryssis Georgiou and Paul G. Spirakis, editors, Proceedings of the 2015 ACM Symposium on Principles of Distributed Computing, PODC 2015, Donostia-San Sebastián, Spain, July 21 - 23, 2015, pages 47–56. ACM, 2015. URL: http://doi.acm.org/10.1145/2767386.2767429, doi:10.1145/2767386.2767429.
-  Dana Angluin, James Aspnes, Zoë Diamadi, Michael J. Fischer, and René Peralta. Computation in networks of passively mobile finite-state sensors. Distributed Computing, 18(4):235–253, 2006. URL: http://dx.doi.org/10.1007/s00446-005-0138-3, doi:10.1007/s00446-005-0138-3.
-  Dana Angluin, James Aspnes, and David Eisenstat. Fast computation by population protocols with a leader. Distributed Computing, 21(3):183–199, September 2008.
-  James Aspnes and Eric Ruppert. An introduction to population protocols. In Benoît Garbinato, Hugo Miranda, and Luís Rodrigues, editors, Middleware for Network Eccentric and Mobile Applications, pages 97–120. Springer-Verlag, 2009.
-  Florence Bénézit, Patrick Thiran, and Martin Vetterli. Interval consensus: From quantized gossip to voting. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009, 19-24 April 2009, Taipei, Taiwan, pages 3661–3664. IEEE, 2009. URL: http://dx.doi.org/10.1109/ICASSP.2009.4960420, doi:10.1109/ICASSP.2009.4960420.
-  Petra Berenbrink, Robert Elässser, Tom Friedetzky, Dominik Kaaser, Peter Kling, and Tomasz Radzik. Majority & stabilization in population protocols. Unpublished manuscript, available on arXiv, May 2018.
-  Petra Berenbrink, Tom Friedetzky, Peter Kling, Frederik Mallmann-Trenn, and Chris Wastell. Plurality consensus via shuffling: Lessons learned from load balancing. CoRR, abs/1602.01342, 2016. URL: http://arxiv.org/abs/1602.01342.
-  Andreas Bilke, Colin Cooper, Robert Elsässer, and Tomasz Radzik. Brief announcement: Population protocols for leader election and exact majority with O(log n) states and O(log n) convergence time. In Elad Michael Schiller and Alexander A. Schwarzmann, editors, Proceedings of the ACM Symposium on Principles of Distributed Computing, PODC 2017, Washington, DC, USA, July 25-27, 2017, pages 451–453. ACM, 2017. Full version available at arXiv:1705.01146. URL: http://doi.acm.org/10.1145/3087801.3087858, doi:10.1145/3087801.3087858.
-  Moez Draief and Milan Vojnovic. Convergence speed of binary interval consensus. In INFOCOM 2010. 29th IEEE International Conference on Computer Communications, Joint Conference of the IEEE Computer and Communications Societies, 15-19 March 2010, San Diego, CA, USA, pages 1792–1800. IEEE, 2010. URL: http://dx.doi.org/10.1109/INFCOM.2010.5461999, doi:10.1109/INFCOM.2010.5461999.
-  Leszek Gasieniec and Grzegorz Stachowiak. Fast space optimal leader election in population protocols. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2018, New Orleans, LA, USA, January 7-10, 2018, pages 2653–2667, 2018. URL: https://doi.org/10.1137/1.9781611975031.169, doi:10.1137/1.9781611975031.169.
-  Leszek Gasieniec, Grzegorz Stachowiak, and Przemyslaw Uznanski. Almost logarithmic-time space optimal leader election in population protocols. CoRR, abs/1802.06867, 2018. URL: http://arxiv.org/abs/1802.06867, arXiv:1802.06867.
-  Mohsen Ghaffari and Merav Parter. A polylogarithmic gossip algorithm for plurality consensus. In Proceedings of the 2016 ACM Symposium on Principles of Distributed Computing PODC, pages 117–126, 2016.
-  A. Kosowski and P. Uznański. Population Protocols Are Fast. ArXiv e-prints, April 2018. arXiv:1802.06872v2.
-  George B. Mertzios, Sotiris E. Nikoletseas, Christoforos L. Raptopoulos, and Paul G. Spirakis. Determining majority in networks with local interactions and very small local memory. In Javier Esparza, Pierre Fraigniaud, Thore Husfeldt, and Elias Koutsoupias, editors, Automata, Languages, and Programming, volume 8572 of Lecture Notes in Computer Science, pages 871–882. Springer Berlin Heidelberg, 2014. URL: http://dx.doi.org/10.1007/978-3-662-43948-7_72, doi:10.1007/978-3-662-43948-7_72.
-  Michael Mitzenmacher and Eli Upfal. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, New York, NY, USA, 2005.
-  Thomas Sauerwald and He Sun. Tight bounds for randomized load balancing on arbitrary network topologies. In 53rd Annual IEEE Symposium on Foundations of Computer Science, FOCS 2012, New Brunswick, NJ, USA, October 20-23, 2012, pages 341–350. IEEE Computer Society, 2012. URL: http://dx.doi.org/10.1109/FOCS.2012.86, doi:10.1109/FOCS.2012.86.
Appendix A Appendix
a.1 Pseudocodes for Section 2 – -time Majority protocol
This section contains our pseudocodes left out from Section 2 for our Majority protocol.
a.2 Pseudocodes for Section 3 – protocol FastMajority1
This section contains our pseudocodes left out from Section 3 for our FastMajority1 protocol.
a.3 Proofs for Section 4 – protocol FastMajority1
For convenience we assume in the proofs that opinion is the majority opinion, that is, .
Proof of Lemma 4. We consider an epoch such that phase belongs to this or a later epoch and assume that the condition holds at a (global) step .
Case 1: phase belongs to a later epoch .
Assume therefore that the condition holds at step . Thus for each node , is in epoch and . Actually, for most of the nodes , (from the condition 1 of ), but there may be a small number of nodes with their epoch_step counters outside this range.
At step , the total value, w.r.t. the end of epoch , of the tokens which are out-of-sync or in nodes with epoch_step counters outside the interval is at most (from the condition 2b of ). We first wait until the step to ensure that w.h.p. all nodes are in the second part of the epoch. Indeed, the conditions of the system at step and Lemma 4 applied to steps give that at step w.h.p. all epoch_step counters are within the interval .
At step most of the tokens are normal, that is, their value is as required at the end of epoch (and at the beginning of the next epoch ). The out-of-sync tokens have values larger than , but at most , and their total value is at most w.r.t. the end of epoch . We view the set of out-of-sync tokens as the set of base tokens of value , which are grouped into larger tokens. That is, an out-of-sync token of value , for some , is a group of base tokens. The number of base tokens is at most . We consider an arbitrary base token and show that w.h.p. this token interacts in steps