In this paper, we study random gossip processes in smartphone peer-to-peer networks. We prove the best-known gossip bound in the standard synchronous model used to describe this setting, and then establish new results in a novel asynchronous variation of this model that more directly matches the real world behavior of smartphone networks. Our results imply that simple information spreading strategies work surprisingly well in this complicated but increasingly relevant environment.
In more detail, a random gossip process is a classical strategy for spreading messages through a peer-to-peer network. It has the communicating nodes randomly select connection partners from their eligible neighbors, and then once connected exchange useful information.111The main place where different random gossip processes vary is in their definition of “eligible.” What unites them is the same underlying approach of random connections to nearby nodes. As elaborated in Section II, these random processes are well-studied in standard peer-to-peer models where they have been shown to spread information efficiently despite their simplicity.
To date, however, little is known about these processes in the emerging setting of smartphone peer-to-peer networks, in which nearby smartphone devices connect with direct radio links that do not require WiFi or cellular infrastructure. As also elaborated in Section II, both Android and iOS now provide support for these direct peer-to-peer connections, enabling the possibility of smartphone apps that generate large peer-to-peer networks that can be deployed, for example, when infrastructure is unavailable (i.e., due to a disaster) or censored (i.e., due to government repression). This paper investigates whether the random gossip processes that have been shown to spread information well in other peer-to-peer settings will prove similarly useful in this intriguing new context.
I-a The Mobile Telephone Model (MTM)
The mobile telephone model (MTM), introduced by Ghaffari and Newport , extends the well-studied telephone model of wired peer-to-peer networks (e.g.,[2, 3, 4, 5, 6, 7, 8, 9, 10]) to better capture the dynamics of standard smartphone peer-to-peer libraries. In recent years, several important peer-to-peer problems have been studied in the MTM, including rumor spreading , load balancing , leader election , and gossip .
As we elaborate in Section III-A, the mobile telephone model describes a peer-to-peer network topology with an undirected graph, where the nodes correspond to the wireless devices, and an edge between two nodes indicates the corresponding devices are close enough to enable a direct device-to-device link. Time proceeds in synchronous rounds. At the beginning of each round, each node can advertise a bounded amount of information to its neighbors in the topology. At this point, each node can then decide to either send a connection invitation to a neighbor, or instead receive these invitations, choosing at most one incoming invitation to accept, forming a connection. Once connected, a pair of node can perform a bounded amount of communication before the round ends. Each node is limited to participate in at most one connection per round.
I-B Gossip in the MTM
The gossip problem assumes that out of the nodes start with a gossip message. The problem is solved once all nodes have learned all messages. In the context of the MTM, we typically assume that at most gossip messages can be transferred over a connection in a single round, and that advertisements are bounded to at most bits.
A natural random gossip process in this setting is the following: In each round, each node advertises a hash of its token set and flips a fair coin to decide whether to send or receive connection invitations. If a node decides to send, and it has at least one neighbor advertising a different hash (implying non-equal token sets), then it selects among these neighbors with uniform randomness to choose a single recipient of a connection invitation. If the invitation is accepted the two nodes exchange a constant number of tokens in their set difference.
It is straightforward to establish that with high probability inthis process solves gossip in rounds. The key insight is that in every round there is at least one potentially productive connection, and that there can be at most such connections before all nodes know all messages. (See  for the details of this analysis.)
In our previous work on gossip in the MTM , we explored the conditions under which you could improve on this crude bound. We conjectured that a more sophisticated analysis could show that simple random processes improve on the bound given sufficient graph connectivity, but were unable to make such an analysis work. Accordingly, in  we turned our attention to a more complicated gossip algorithm called crowded bin
. Unlike the simple structure of random gossip processes, crowded bin requires non-trivial coordination among nodes, having them run a distributed size estimation protocol (based on a balls-in-bins analysis) on, and then using these estimates to parametrize a distributed TDMA protocol that eventually enables independent token spreading processes to run in parallel.
In , we prove that crowded bin solves gossip in rounds, with high probability in , when run in a network topology with vertex expansion (see below). For all but the smallest values of (i.e., least amounts of connectivity), this result is an improvement over the crude bound achieved by the random process.
A key open question from this previous work is whether or not it is possible to close the time complexity gap between the appealingly simple random gossip processes and the more complicated machinations of crowded bin. As we detail next, this is the question tackled in this paper.
I-C New Result #1: Improved Analysis for Gossip in the MTM
In Section III-A, we consider a variation of the simple random gossip process described above modified only slightly such that a node only considers a neighbor eligible if it advertises a different hash and it has not recently attempted a connection with that particular neighbor. We call this variation random spread gossip
By introducing a new analysis technique, we significantly improve on the straightforward bound for random gossip processes like random spread. Indeed, we prove this process is actually slightly more efficient than the more complicated crowded bin algorithm from , showing that with high probability in , random spread requires only rounds to spread all the messages in a network with vertex expansion .
The primary advantage of random spread gossip is its simplicity. As with most random gossip processes, its behavior is straightforward and easy to implement as compared to existing solutions. A secondary advantage is that this algorithm works in the ongoing communication scenario in which new rumors keep arriving in the system. Starting from any point in an execution, if there are rumors that are not yet fully disseminated, they will reach all nodes in at most an additional rounds, regardless of how many rumors have been previously spread. The solution in , by contrast, must be restarted for each collection of rumors, and includes no mechanism for devices to discover that gossip has completed for the current collection. Accordingly, this new result fully supersedes the best known existing results for gossip in the MTM under similar assumptions.222In , we also study gossip under other assumptions, like changing communication graphs and the lack of good hash functions.
At the core of our analysis is a new data structure we call a size band table that tracks the progress of the spreading rumors. We use this table to support an amortized analysis of spreading that proves that the stages in which rumors spread slowly are balanced out sufficiently by stages in which they spread quickly, providing a reasonable average rate.
I-D New Result #2: Gossiping in the Asynchronous MTM
The mobile telephone model is a high-level abstraction that captures the core dynamics of smartphone peer-to-peer communication, but it does not exactly match the behavior of real smartphone networking libraries. The core difference between theory and practice in this context is synchronization. To support deep analysis, this abstract model (like many models used to study distributed graph algorithms) synchronizes devices into well-defined rounds. Real smartphones, by contrast, do not offer this synchronization. It follows that algorithms developed in the mobile telephone model cannot be directly implemented on real hardware.
With the goal of closing this gap, in Section IV we introduce the asynchronous mobile telephone model (aMTM), a variation of the MTM that removes the synchronous round assumption, allowing nodes and communication to operate at different speeds. The main advantage of the aMTM, is that algorithms specified and analyzed in the aMTM can be directly implemented using existing smartphone peer-to-peer libraries. The main disadvantage is that the introduction of asynchrony complicates analysis.
In Section IV, we first study the question of whether simple random gossip processes even still converge in a setting where nodes and messages can operate at different speeds controlled by an adversary. We answer this question positively by proving that a simple random gossip process solves gossip in time, where is an upper bound on the maximum time certain key steps can occur (as is standard, we assume is determined by an adversary, can change between executions, and is unknown to the algorithm).
We then tackle the question of whether it is still possible to show that the time complexity of information spreading improves with vertex expansion in an asynchronous setting. The corresponding analyses in the synchronous MTM, which treats nodes as implicitly running an approximate maximum matching algorithm between nodes that know a certain token and those that do not, depend heavily on the synchronization of node behavior.
We introduce a novel analysis technique, in which we show that the probabilistic connection behavior in the aMTM over time sufficiently approximates synchronized behavior to allow our more abstract graph theory results to apply. In particular, we prove that for , the single message spreads in at most time. This result falls somewhere between our previous result for gossip with in the aMTM, and the bound of rounds possible in the synchronous MTM for . The remaining gap with the synchronous results seems due the ability of synchronous algorithms to keep a history of recent connection attempts (crucial to the underlying matching analysis), whereas in the asynchronous model such histories might be meaningless if some nodes are making connections attempts much faster than others.
We argue that our introduction of the aMTM, as well as a powerful set of tools for analyzing information spreading in this setting, provides an important foundation for the future study of communication processes in realistic smartphone peer-to-peer models.
Ii Related Work
In recent years, there has been a growing amount of research on smartphone peer-to-peer networking [14, 15, 16, 17, 18, 19, 20] (see  for a survey). There has also been recent work on using device-to-device links to improve cellular network performance, e.g., the inclusion of peer-to-peer connections in the emerging LTE-advanced standard [22, 23, 24], but these efforts differ from the peer-to-peer applications studied here as they typically assume coordination provided by the cellular infrastructure.
In this paper, we both study and extend the mobile telephone model introduced in 2016 by Ghaffari and Newport . This model modifies the classical telephone model of wired peer-to-peer networks (e.g., [2, 3, 4, 5, 6, 7, 8, 9, 10]) to better match the constraints and capabilities of the smartphone setting. In particular, the mobile telephone model differs from the classical telephone model in that it allows small advertisements but restricts the number of concurrent connections at a given node. As agued in , these differences (especially the latter) significantly change achievable results, algorithm strategy, and analysis techniques. The details of this model are inspired, in particular, but the multipeer connectivity framework offered in iOS.
Our random spread gossip algorithm disseminates rumors in at most rounds in the mobile telephone model in a network with nodes and vertex expansion (see Section III-C). The previous best known algorithm for this model is crowded bin gossip , which is significantly more complicated and requires rounds.333In , crowded bin is listed as requiring rounds, but that result assumes single bit advertisements in each round—requiring devices to spell out control information over many rounds of advertising. To normalize with this paper, in which tags can contain bits, crowded bin’s time complexity improves by a factor. We note that  also explores slower gossip solutions for more difficult network settings not considered here; e.g., changing network topologies and the absence of advertisements.
To put these time bounds into context, we note that previous work in the mobile telephone model solved rumor spreading  and leader election  in rounds. In the classical telephone model, a series of papers [7, 8, 9, 10] (each optimizing the previous) established that simple random rumor spreading requires rounds , which is optimal in the sense that for many values, there exists networks with a diameter in . The fact that our gossip solution increases these bounds by a factor of (ignoring log factors) is natural given that we allow only a constant number of tokens to be transferred per round.
As mentioned, random gossip processes more generally have been studied in other network models. These abstractions generally model time as synchronized rounds and by definition require nodes to select a neighbor uniformly at random in each round  . More recent work has demonstrated that these protocols take advantage of key graph properties such as vertex expansion and graph conductance 
. Asynchronous variants of these protocols have also been explored, where asynchrony is captured by assigning each node a clock following an unknown but well-defined probability distribution . The asynchronous MTM model introduced in our paper, by contrast, deploys a more general and classical approach to asynchrony in which an adversarial scheduler controls the time required for key events in a worst-case fashion.
Iii Random Gossip in the Mobile Telephone Model
Here we study a simple random gossip process in the mobile telephone model. We begin by formalizing the model, the problem, and some graph theory preliminaries, before continuing with the algorithm description and analysis.
Iii-a The Mobile Telephone Model
The mobile telelphone model describes a smartphone peer-to-peer network topology as an undirected connected graph . A computational process (called a node in the following) is assigned to each vertex in . The edges in describe which node pairs are within communication range. In the following, we use to indicate both the vertex in the topology graph as well as the computational process (node) assigned to that vertex. We use to indicate the network size.
Executions proceed in synchronous rounds labeled , and we assume all nodes start during round . At the beginning of each round, each node selects an advertisement to broadcast to its neighbors in . This advertisement is a bit string containing no more than bits, where is the digest length of a standard hash function parameterized to obtain the desired collision resistance guarantees. After broadcasting its advertisement, node then receives the advertisements broadcast by its neighbors in for this round.
At this point, decides to either send a connection invitation to a neighbor, or passively receive these invitations. If decides to receive, and at least one connection invitation arrives at , then node can select at most one such incoming invitation to accept, forming a connection between and the node that sent the accepted invitation. Once and are connected, they can perform a bounded amount of reliable interactive communication before the round ends, where the magnitude of this bound is specified as a parameter of the problem studied. Notice that the model does not guarantee to deliver all invitations sent to by its neighbors. It instead only guarantees that if at least one neighbor of sends an invitation, then will receive a non-empty subset (selected arbitrarily) of these invitations before it must make its choice about acceptance.
If instead chooses to send a connection invitation to a neighbor , there are two outcomes. If accepts ’s invitation, a connection is formed as described above. Otherwise, ’s invitation is implicitly rejected.
Iii-B The Gossip Problem
The gossip problem is parameterized with a token count . It assumes unique tokens are distributed to nodes at the beginning of the execution. The problem is solved once all nodes have received all tokens. We treat the tokens as black boxes objects that are large compared to the advertisements. With this in mind, we assume the only ways for a node to learn token are: (1) starts with token ; or (2) a node that previously learned sends the token to during a round in which and are connected.
We assume that at most a constant number of tokens can be sent over a given connection. Notice that this restriction enforces a trivial round lower bound for the problem.
Iii-C Vertex Expansion
Some network topologies are more suitable for information dissemination than others. In a clique, for example, a message can spread quickly through epidemic replication, while spreading a message from one endpoint of a line to another is necessarily slow. With this in mind, the time complexity of information dissemination algorithms are often expressed with respect to graph connectivity metrics such as vertex expansion or graph conductance. In this way, an algorithm’s performance can be proved to improve along with available connectivity.
In this paper, as in previous studies of algorithms in the mobile telephone model [1, 11, 12, 13], we express our results with respect to vertex expansion (see  for an extended discussion of why this metric is more appropriate than conductance in our setting). Here we define this metric and establish a useful related property.
For fixed undirected connected graph , and a given , we define the boundary of , indicated , as follows: : that is, is the set of nodes not in that are directly connected to by an edge in . We define . We define the vertex expansion of a given graph as follows:
Notice that despite the possibility of for some , we always have . In more detail, this parameter ranges from for poorly connected graphs (e.g., a line) to values as large as for well-connected graphs (e.g., a clique). Larger values indicate more potential for fast information dissemination.
The mobile telephone model requires the set of pairwise connections in a given round to form a matching in the topology graph . The induces a connection between maximum matchings and the maximum amount of potential communication in a given round. Here we adapt a useful result from  that formalizes the relationship between vertex expansion and these matchings as defined with respect to given partition.
In more detail, for a given graph and node subset , we define to be the bipartite graph with bipartitions , and the edge set , , and . Recall that the edge independence number of a graph , denoted , describes the size of a maximum matching on . For a given , therefore, describes the maximum number of concurrent connections that a network can support in the mobile telephone model between nodes in and nodes outside of . This property follows from the restriction in this model that each node can participate in at most one connection per round.
The following result notes that the vertex expansion does a good job of approximating the size of the maximum matching across any partition:
Lemma III.1 (from ).
Fix a graph with with vertex expansion . Let . It follows that .
Iii-D The Random Spread Gossip Algorithm
We formalize our random spread gossip algorithm with the pseudocode labeled Algorithm 1. Here we summarize its behavior.
The basic idea of the algorithm is that in each round, each node advertises a hash of their token set. Nodes then attempt to connect only to neighbors that advertised a different hash, indicating their token sets are different. When two nodes connect, they can transfer a constant number of tokens in the non-empty set difference of their respective token sets.
As detailed in the pseudocode, the random spread algorithm implements the above strategy combined with some minor additional structure that supports the analysis. In particular, nodes partition rounds into phases of length , where is an upper bound on the maximum degree in the network topology. Instead of each node deciding whether to send or receive connection invitations at the beginning of each round, they make this decision at the beginning of each phase, and then preserve this decision throughout the phase (this is captured in the pseudocode with the flag that is randomly set every rounds). Each receiver node also advertises whether or not it has been involved in a connection already during the current phase (as captured with the flag). A sender node will only consider neighbors that advertise a different hash, are receivers in the current phase, and have not yet been involved in a connection during the phase.
Iii-E Analysis of Random Spread Gossip
Our goal is to prove the following result about the performance of random spread gossip:
With high probability, the random spread gossip algorithm solves the gossip problem in rounds, when executed with initial tokens and degree bound , in a network topology graph of size , maximum degree , and vertex expansion .
We begin by establishing some preliminary notations and assumptions before continuing to the main proof argument.
For a fixed execution, let be the non-empty set of tokens that the algorithm must spread. For each round and node , let be the tokens (if any) “known” by at the start of round (that is, the tokens that starts with as well as every token it received through a connection in rounds to ).
For each , and round , let be the nodes that know token at the start of round . Let be the number of nodes that know token iat the beginning of this round, and let .
Finally, let be a token with the maximum value in this round (breaking ties arbitrarily). According to Lemma III.1, which connects vertex expansion to matchings, there is a matching between nodes in and of size at least . Token , in other words, has the largest guaranteed potential to spread in round among all tokens.444To be slightly more precise, is a lower bound on the size of the matching across the cut defined by , so is the token with the largest lower bound guarantee on the size of its matching. Accordingly, in the analysis that follows, we will focus on this token in each phase to help lower bound the amount of spreading we hope to achieve.
Productive Connections and Hash Collisions
In the following, we say a given pairwise connection between nodes and in some round is productive if . That is, at least one of these two nodes learns a new token during the connection. By the definition of our algorithm, if and connect in round , then it must be the case that , where is the hash function used by the random spread gossip algorithm. This implies —indicating that every connection created by our algorithm is productive.
On the other hand, it is possible for some , , and that even though , due to a hash collision. For the sake of clarity, in the analysis that follows we assume that no hash collisions occur in the analyzed execution. Given the execution length is polynomial in the network size , and there are at most different token sets hashed in each round, for standard parameters the probability of a collision among this set would be extremely small, supporting our assumption.
We emphasize, however, that even if a small number of collisions do occur, their impact is minimal on the performance of random spread gossip. The worst outcome of a hash collision in a given round is that during that single round a potentially productive connection is not observed to be productive and therefore temporarily ignored. As will be made clear in the analysis that follows, the impact of this event is nominal. Indeed, even if we assumed that up to a constant fraction of the hashes in every round generated collisions—an extremely unlikely event for all but the weakest hash function parameters—the algorithm’s worst case time complexity would decrease by at most a constant factor.
Recall that our algorithm partitions rounds into phases of length . For each phase , let be the first round of that phase. Fix some arbitrary phase and consider token , which, as argued above, is the token with the largest guaranteed potential to spread in round . Our goal in this part of the analysis is to prove that with constant probability, our algorithm will create enough productive connections during this phase to well-approximate this potential. This alone is not enough to prove our algorithm terminates efficiently, as in some phases, it might be the case that no token has a large potential to spread. The next part of our argument will tackle this challenge by proving that over a sufficient number of phases the aggregate amount of progress must be large.
We begin by establishing the notion of a productive subgraph:
At the beginning of any round , we define the productive subgraph of the network topology for as: , where , and for each , indicates the value of the node ’s status bit for the phase containing round .
That is, the productive subgraph for round is the subgraph of that contains only edges where the endpoints: (1) have different token set; and (2) have different statuses (one is a sender during this phase and one is a receiver). This subgraph contains every possible connection for a given round of our gossip algorithm (we ignore flags because, as will soon be made clear, we consider these graphs defined only for the first round of phases, a point at which all flags are reset to ). Accordingly, a maximum matching on this subgraph upper bounds the maximum number of concurrent connections possible in a round.
We begin by lower bounding the size of the maximum matching in a productive subgraph at the beginning of a given phase using the token . Recall that is the number of nodes that know token at the beginning of , if less than half know the token, and otherwise indicates the number of nodes that do not know .
Fix some phase . Let . Let be the productive subgraph for round , be a maximum matching on , and . With constant probability (defined over the assignments): .
Fix some phase . We define to be the potentially productive subgraph for round , where potentially productive is defined the same as productive except we omit the requirement that endpoints of edges in the graph have different values. Let be a maximum matching on and . We will reason about as an intermediate step toward bounding the size of the actual productive subgraph for this round.
Let . Consider the cut between nodes that know , and nodes that do not, at the beginning of this phase. By Lemma III.1, there is a matching across this cut of size at least . By definition, for all edges across this cut, their endpoints have different token sets at the beginning of round , therefore they are all candidates to be included in , implying that .
Our next step is to consider the random assignment of sender and receiver status to nodes in at the beginning of phase . For an edge in to be included in a matching on the productive subgraph , it must be the case that one endpoint chooses to be a receiver while the other chooses to be a sender. We call such an edge good. For any particular edge , this occurs with probability .
For each such , let be the random indicator that evaluates to if is good, and evaluates to otherwise. Let be the number of good edges for this phase. By our above probability calculation, we know:
Because is a matching, these indicator variables are independent. This allows us to concentrate on the mean. In particular, we will apply the following multiplicative Chernoff Bound, defined for and any :
with , to establish that the probability that is upper bounded by:
It follows that is less than or equal to , which is itself greater than or equal to with a probability upper bounded by a constant—as required.555Clearly, the specific worst failure bound of is loose (in the worst case, where , for example, we can directly calculate that with probability ). We are not, however, attempting to optimize constants in this analysis, so any constant bound is sufficient for our purposes. ∎
We now turn our attention to our gossip algorithm’s ability to take advantage of the potential productive connections captured by the productive subgraph defined at the beginning of the phase. To do so, we first adapt a useful result on rumor spreading from  to the behavior of our gossip algorithm. Notice that it is the proof of the below adapted lemma that requires the use of the flag in our algorithm.
Lemma III.5 (adapted from Theorem 7.2 in ).
Fix a phase . Let be a subgraph of the productive subgraph that satisfies the following:
there is a matching of size in ;
the set of nodes in with sender status is of size ; and
for each node , every neighbor of in is in .
With constant probability (defined over the random neighbor choices), during the first rounds of phase , at least neighbors of nodes in in participate in a productive connection.
The original version of this theorem from  requires that is a bipartite graph. This follows in our case because it is a subset of a productive subgraph. All subsets of productive subgraphs are bipartite as you can put the nodes with sender status in one bipartition and nodes with receiver status in the other (by definition the only edges in a productive subgraph are between sender and receiver nodes).
Another difference is that our theorem studies our gossip algorithm, while the theorem from  studies the PPUSH rumor spreading process. The PPUSH process assumes a single rumor spreading in the system. Some nodes know the rumor (and are called informed) and some nodes do not (and are called uninformed). In each round, each node declares whether or not they are uninformed. Each informed node randomly chooses an uninformed neighbor (if any such neighbors exist) and tries to form a connection, changing the receiver’s status to informed.
The original version of the theorem states that if you execute PPUSH for rounds, at least nodes that neighbor are informed. If we consider senders to be informed and receivers to be uninformed, our gossip algorithm behaves the same as PPUSH in rounds under consideration. That is, the senders in will randomly select a receiver neighbor to attempt a connection.
Once a receiver in participates in a connection in our algorithm, it sets its flag to for the remainder of the phase, preventing future attempts to connect to it during the phase. This matches the behavior in PPUSH where once a node becomes informed, informed neighbors stop trying to connect to it. This congruence allows us to derive the same bound derived for PPUSH in . ∎
Fix some phase . Let . With constant probability, the number of productive connections in this phase is in .
Fix some phase . By Lemma III.4, with some constant probability , the productive subgraph has a matching of size once nodes randomly set their flags.
Now consider the subgraph graph that consists of every sender endpoint in , and for each such sender , every receiver that neighbors , as well as the edge . This subgraph satisfies the conditions of Lemma III.5 for . Applying this lemma, it follows that with some constant probability , during this phase, the random neighbor selections by senders will generate at least productive connections.
Combining these two results, we see that with constant probability , we have at least productive connections, as claimed by the lemma statement. ∎
The Size Band Table
In the previous part of this analysis, we proved that with constant probability the number of productive connections in phase is bounded with respect to the number of nodes that know . In the worst case, however, might be quite small (e.g., at the beginning of an execution where each token is known by only a constant number of nodes, this value is constant). We must, therefore, move beyond a worst-case application of Lemma III.6, and amortize the progress over time to something more substantial.
To accomplish this goal, we introduce a data structure—a tool used only in the context of our analysis—that we call a size band table, which we denote as . This table has one column for each token , and rows which we number .
As we will elaborate below, each row is associated with a range of values that we call a band. We call rows through growth bands, and rows through shrink bands. Each cell in contains a single bit. We update these bit values after every round of our gossip algorithm to reflect the extent to which each token has spread in the system.
In more detail, for each round , we use to describe the size band table at the beginning of round . For each token and row , we use to refer to the bit value in row of the column dedicated to token in the table for round .
Finally, we define each of these bit values as follows. For each round , token , and growth band (i.e., for each ), we define:
Symmetrically, for each round , token , and shrink band (i.e., for each ), we define:
A key property of the side band table is that as a given token spreads, the cells in its column with bits grow from the smaller rows toward the larger rows. That is, if row is at the beginning of a given round, all smaller rows for that token are also at the beginning of that round. Furthermore, because nodes never lose knowledge of a token, once a cell is set to , it remains .
When all rows for a given token are set to , it follows that all nodes know . This follows because the definition of shrink band being set to is that the number of nodes that do not know is strictly less than:
Amortized Analysis of Size Band Table Progress
As the size band table increases the number of bits, we say it progresses toward a final state of all bits. Here we perform an amortized analysis of size band table progress.
To do so, we introduce some notation. For each phase , and token , let be the largest row number that contains a in ’s column in . We call this the current band for token in phase .
Let define the distance from the current band of token to the center row number . By the definition of , no token has a current band closer to than at the start of phase . We say that phase is associated with the current band for .
Finally, for a given phase , with , we say this phase is successful if the number of productive connections during the phase is at least as large as the lower bound specified by Lemma III.6; i.e., there are at least productive connections. where is the constant hidden in the asymptotic bound in the lemma statement.
Our first goal in this part of the analysis, is to bound the number of successful phases that can be associated with each band. To do so, we differentiate between two different types of successful phases, and then bound each separately.
Fix some phase that is associated with some band at distance from the center of the size band table. We say phase is an upgrade phase if there exists a subset of the productive connections during phase that push some token ’s current band to a position with . If a phase is not an upgrade phase, and at least one node is missing at least one token, we call it a fill phase.
Stated less formally, we call a phase an upgrade phase if it pushes some token’s count closer to the center of the size band table—row —than the band associated with the phase. Our definition is somewhat subtle in that it must handle the case where during a phase a token count does grow to be closer to the center of the size band table, but then its count continues to grow until it pushes more than distance above the center. We still want to count this as an upgrade phase (hence the terminology about there existing some subset of the connections that push the count closer).
Our goal is to bound the number of successful phases possible before all tokens are spread. We begin with bound on upgrade phases (which hold whether or not the phase is successful). Our subsequent bound on fill phases, however, considers only successful phases.
There can be at most upgrade phases.
Fix some band . Consider an upgrade phase that is associated with . By the definition of an upgrade phase, there is some token with a current band at the start of that is distance at least from the center of table, but that has its count grow closer to the center during the phase.
We note that it must be the case that ’s current band at the start of phase is a growth band. This holds because if ’s current band is a shrink band then additional spreading of token can only increase its distance from the center of the size band table.
If is a growth band, then it follows that ’s current band starts phase no larger than and ends phase larger, because current bands for a token never decrease. Moving forward, therefore, token can never again be the cause of a phase associated with band to be categorized as an upgrade phase.
On the other hand, if is a shrink band, we know that after phase , token ’s distance will remain closer to the center of the table than until ’s current band becomes a shrink band. Once again, therefore, moving forward token can never again cause a phase associated with band to be categorized as an upgrade.
The lemma statement follows as there are bands, and for each band, each of the tokens can transform that band into an upgrade phase at most once. ∎
We now bound the number of successful fill phases. To do so, we note that the number of fill phases associated with a given band is bounded by the worst case number of connections needed before some token’s count must advance past that band. For bands associated with large ranges this worst case number is large. As shown in the following lemma, however, the number of connections in phases associated with large bands grows proportionally large as well. This balancing of growth required and growth obtained is at the core of our amortized analysis.
There can be at most successful fill phases.
Consider a group of successful fill phases associated with some band at distance from the center of the size band table. Because these are fill phases, the productive connections generated during these phases can never push some token’s count (perhaps temporarily) closer than distance from the center of the table (any phase in which this occurs becomes, by definition, an upgrade phase).
One way to analyze the distribution of the productive connections during these phases is to consider a generalization of the size band table in which we record in each cell the total number of productive connections that spread token while its count falls into the band associated with row . (Of course, many connections for a given token might occur in a given round, in which we case, we process them one by one in an arbitrary order while updating the cell counts.)
If we apply this analysis only for the fill phases fixed above, then we know that the counts in all cells of distance less than from the center of the table remain at . By the definition of the size band table, for a given token , the maximum number of connections we can add to cells of distance at least from the center is loosely upper bounded by (the extra factor of two captures both growth and shrink band cells at least distance ). Therefore, the total number of productive connections we can process into cells at distance at least is at most .
By the definition, each phase that is a successful fill phase associated with generates at least productive connections, where . By the definition of , ’s current band is distance from the center. Therefore, is within a factor of of . By absorbing that constant factor into the constant (to produce a new constant ), it follows that this phase generates at least
new productive connections. Combined with our above upper bound on the total possible productive connections for successful fill phases associated with , it follows that the total number of successful fill phases associated with as less than:
We multiply this bound over possible bands to derive total possible successful fill phases, providing the bound claimed by the lemma statement. ∎
Pulling Together the Pieces
We are now ready to combine the above lemmas to prove our main theorem.
Proof (of Theorem iii.2).
By Lemma III.6, if the token spreading is not yet complete, then the probability that the current phase is successful is lower bounded by some constant probability . The actual probability might depend on the execution history up until the current phase, but the lower bound of always holds, regardless of this history. We can tame these dependencies with a stochastic dominance argument.
In more detail, for each phase
before the tokens are spread, we define a trivial random variablethat is with independent probability , and otherwise . Let , by contrast, be the random indicator variable that is if phase is successful, and otherwise . For each phase that occurs after the tokens are spread, by default.
Note that for each , stochastically dominates . It follows that if is greater than some with some probability , then is greater than with probability at least .
With this established, consider the first phases, for some constant . Note that for this value of , . Because is the sum of independent random variables, we concentrate around this expectation. In particular, we once again apply the following form of a Chernoff Bound:
for , , and , to derive that the probability that , is upper bounded by . The same bound therefore holds for the probability that . Notice that this error bound is polynomially small in with an exponent that grows with constant . It follows, therefore, that with high probability in , that token spreading succeeds in the first phases.
To achieve the final round complexity bound claimed by the theorem statement, we multiply this upper bound on phases by the length of rounds per phase. ∎
Iv Random Gossip in the Asynchronous Mobile Telephone Model
The mobile telephone model captures the basic dynamics of the peer-to-peer libraries included in standard smartphone operating systems. This abstraction, however, makes simplifying assumptions—namely, the assumption of synchronized rounds. In this section we analyze the performance of simple random gossip processes in a more realistic version of the model that eliminates the synchronous round assumption. In particular, we first define the asynchronous mobile telephone model (aMTM), which describes an event-driven peer-to-peer abstraction in which an adversarial scheduler controls the timing of key events in the execution.
An algorithm specified in the aMTM should be directly implementable on real hardware without the need to synchronize or simulate rounds. This significantly closes the gap between theory and practice. With this in mind, after defining the aMTM, we specify and analyze a basic random gossip process strategy. In this more realistic asynchronous model, different processes can be running at vasty different and changing speeds, invalidating the clean round-based analysis from the previous section. We will show, however, that even in this more difficult setting, random gossip processes can still be analyzed and shown to spread tokens with speed that increases with available connectivity.
Iv-a The Asynchronous Mobile Telephone Model
Since the pattern of communication in the asynchronous setting can be complex, our first goal in creating our new abstraction is to impose a simple but flexible structure for how processes communicate with each other. To this end, we introduce a meta-algorithm that is run by each process individually, independent of all others processes in the network. This allows us to analyze the running time of a particular instance of an algorithm and, from there, the performance of the algorithm across all concurrent network instances.
We will require two primary properties from our algorithmic structure. First, for our protocols to be truly asynchronous, they will not be able to follow a static procedural flow. Namely, after perfoming some action, an algorithm in this model may have to wait an indeterminate amount of time before performing another action or even being notified of the results of the first action. While we can parameterize an upper bound for this delay in the model for the sake of our analysis, it is unrealistic for an instance of the algorithm to be aware of this parameter. Second, we would like to abstract away the details of the asynchronous communication from the specfics of the algorithm, allowing us to keep our algorithm descriptions as simple as possible.
We accomplish both of these goals by implementing a structure that resembles a looped synchronous algorithm but regulates its execution through access to data members that are updated asynchronously. Formalized in Algorithm 2, the protocol initializes three fields:
: A key-value store of references to neighboring processes whose advertisements have been received along with their advertisement tags. This set is maintained asynchronously by the model and updated whenever a new advertisement is received. Whenever a new advertisement is received, it replaces the last known advertisement for the corresponding neighboring process.
: An enumerated type field chosen from the set . Also modified asynchronously by the model, this field signifies the current progress in any connections the process is involved in.
: A nullable reference to a single neighbor for communication purposes after a connection is formed.
While these fields accomplish our first goal of enabling our algorithms to execute asynchronously, we satisfy our second goal of abstracting communication details from the implementing algorithm by exposing an interface of four functions:
Initialize(): Initialization of algorithm-specific data.
GetTag(): Return the advertisement tag for this process which is then broadcast to all neighboring processes.
Select(): Return a neighbor (or null for no neighbor) to connect to from among those discovered.
Communicate(): Perform a bounded amount of communication with selected neighbor .
The execution of an iteration of the algorithm loop begins by getting the process’ advertisement tag and broadcasting it to all neighboring processes. The model then blocks until a reference to a neighboring process is added to the set. Once the set contains at least one neighbor, the implementing algorithm selects one neighbor from the set and returns it. If the selected neighbor isn’t null, the protocol then attempts to connect with the selected neighbor, and blocks for another indeterminate duration of time for the connection attempt to succeed or fail . If the connection succeeds, the two connected processes communicate before proceeding to the next iteration.
We assume that each step of the protocol executes instantly with the exception of the model functions blockForNeighborUpdates() and blockForConnection() and the algorithm function Communicate(). These functions implicitly block the protocol’s execution. The model functions block the execution until the and fields are available to be referenced by the algorithm, respectively, while Communicate() stalls until the connected nodes communicate. In order for this abstraction to be useful to our analysis, however, we need to parameterize the maximum duration of these blocking events. We therefore define the corresponding model parameters , , , and which are not known in advance and can change between executions:
: If a process calls update() at time , will be added to the set of all neighboring processes by time at the latest. This is the maximum time for step 14 of the protocol.
: Conversly, if a process calls update() at time , no neighboring process will add to their neighbors set after time where .
: If a process calls connect() at time , by time at the latest, either the connection attempt will have failed or and will have succesfully connected. This is the maximum time for step 18 of the protocol.
: As stated in the model description, once a connection is formed, the connected processes may engage in a bounded amount of communication, defines the maximum time required for this communication to occur. This is the maximum time for step 21 of the protocol.
Notice that the specified model only defines how to attempt outgoing connections. While this abstraction is similar to the mobile telephone model in that it restricts a process to one such connection attempt at a time, it will deviate slightly by allowing a single incoming connection attempt as well. This allowance will ease our analysis of algorithms in this setting as it frees a process to accept an incoming connection attempt regardless of its current state. For now, we will assume the process of accepting incoming connection attempts is simply to accept the first connection attempt received and call Communicate() where is the source of the incoming connection.
Iv-B The Asynchronous Random Spread Gossip Algorithm
We now instantiate our algorithm as a particular instance of the asynchronous mobile telephone model protocol by implementing the four functions specified by the interface. First we initialize the token set of the process to contain any tokens it knows. We also instantiate the hash function used for creating the advertisement tags:
Next we define the tag function to simply return the hash of the token set that the process knows:
To select a neighbor from those that a process has discovered, the algorithm will first create a filtered set of neighbors to only include those that would be productive to connect to (those neighbors with different token hashes). Then, following the random gossip strategy, it will select one such neighbor uniformly at random. If no productive neighbor exists then the algorithm doesn’t select any neighbor and remains idle. Lastly, note that when a productive neighbor is selected, the algorithm clears its set of known neighbors. As we will see in Lemma IV.3, refreshing the set of known nearby processes minimizes the effect of faulty nodes on performance.
Finally, if two processes form a succesful connection, they exchange a single token in the symmetric set difference between their two token sets:
Iv-C Asynchronous Random Spread Gossip Analysis
In this section we analyze the above algorithm. We begin with a proof of convergence, showing that in the worst case the asynchronous random spread gossip algorithm spreads all tokens to all nodes in the network in time . We then take advantage of the vertex expansion to demonstrate how it increases the rate at which a single token is spread.
Iv-C1 Proof of Convergence
We begin our analysis by showing that the asynchronous random spread algorithm spreads all tokens to the entire network in time at most . Firstly, for our analysis of the asynchronous setting, we will have to redefine our notion of the productive subgraph.
At time , define to be the productive subgraph of the network at this time such that where at time .
Notice, as in the previous section, we assume the very low probability event of hash collisions do not occur. That is: . With this in mind, we establish our first bound (remember in the following that , , and are the relevant maximum time bounds—unknown to the algorithm—for key model behavior).
The asynchronous random gossip algorithm takes time to spread all tokens where is the number of nodes in the network, is the number of tokens to spread, and is the maximum amount of time between iterations of the algorithm loop.
Fix some time . Our goal is to show that within the interval to , at least one node learns a new token. Because this can only occur at most times before all nodes know all tokens, if we can show the above we have established the lemma.
Fix some time . Let be the productive subgraph (see the above definition) at the beginning of this interval. If not all tokens have spread, clearly there exists a node such that the in .
By the guarantees of the model, by time , will have heard advertisements from all neighbors in , and then subsequently looped back to the top of its main connect loop.
For each neighbor in , either adds to its set, or at some point after , and ’s token sets changed such that , preventing from adding . In this case, however, at least one new token was learned by some node and we are done. If this is not the case, then now has a non-empty set.
Going forward, let be the node randomly chooses from this set. If the connection fails, this indicates that is involved in another connection with some other node . If the connection is successful, then and will exchange a token. Either way, a new token is learned by some node in in at most another time.
The total amount of time for some node to learn something new is in , as needed.
Let be the maximum number of faulty nodes in the network, the asynchronous random gossip algorithm takes time to spread all tokens.
Again, consider the productive subgraph at a particular time for a node when its set is empty. If no nodes leave the subgraph then is guaranteed to learn of all these neighbors and add them to its set. However, now allow some node in ’s set to experience a failure between times and (if the failure happens before then by the guarantee of the aMTM, will not have received ’s update). Upon entering an iteration of the outer loop, may attempt to connect with since ’s advertisement is still fresh. In this event, which is clearly the worst case, the connection fails and time at most was spent since this is the maximum amount of time the outer loop can possibly take.
This failure can happen in each new iteration of the outer loop for at most time , at which point the advertisement ceases to update ’s neighbor set. Therefore, a single failed node can cause a delay of time at most . Since there are faulty nodes, this introduces a total slowdown of . Therefore, the time for this algorithm to spread all tokens is . Furthermore, if we assume , . ∎
Let be the maximum fraction of neighbors for a node that can be byzantine, the asynchronous random gossip algorithm takes time in expectation to spread all tokens.
If the productive subgraph stays connected, the worst event that can occur during the interval of length is that an honest node chooses a byzantine neighbor to connect to. This happens with probability at most and therefore a node engages in a productive, honest connection with probability at least . Consider the series of intervals of time at most and label them with the indicator variables such that:
Therefore, achieving successes in expectation, would take intervals. Since each interval takes at most time, the algorithm takes time . ∎
Iv-C2 Analysis of Spreading a Single Token
We now analyze the spread of a single token in the network to demonstrate that the performance of the algorithm still improves with the vertex expansion of the network in an asynchronous setting. Our goal in this subsection is to prove the following time bound to spread a single token:
The asynchronous random spread gossip algorithm takes time at most , where is the number of nodes in the network, is the vertex expansion, and is the maximum time required for an iteration of the asynchronous mobile telephone model loop.
Unlike with our analysis of the synchronous algorithm, we cannot directly leverage a productive subgraph that remains stable through synchronized rounds. We must instead identify cores of useful edges amidst the unpredictable churn and argue that over a sufficiently long interval they deliver a sufficiently large number of new tokens.
We accomplish this by fixing the productive subgraph at and observe an interval of length . During this interval, we want to show that for every edge such that is informed and is uninformed, either becomes otherwise informed or returns from Select() with good probability during this interval. Namely, this probability is lower-bounded by the probability would return if included all of ’s neighbors from itself.
For a fixed time and fixed edge such that knows the token and does not, if does not otherwise learn the token in this interval, node returns from Select() uniformly at random from a set of at most nodes where is the degree of in the productive subgraph and the resulting connection attempt concludes no later than time .
Fix the productive subgraph at this time, and fix an informed node and uninformed node . Since is an informed node, all of its edges in the productive subgraph are incident to uninformed nodes. Since nodes never forget the token, the number of uninformed nodes can only decrease. Now consider an execution of Select() before time in which is added to . Since by assumption does not otherwise learn the token in this interval, it must be the case that advertised its uninformed status in this interval and been included in and subsequently so we know this occurs at least once in the interval (the extra time is to allow an additional iteration of ’s loop before Select is called). Furthermore, we know that since the number of uninformed neighbors can’t increase from that in the productive subgraph , there can be at most neighbors in . Since returns a particular neighbor from this set with uniform randomness, the probability that returns is at least . Furthermore, regardless whether or not the resulting connection attempt is a success or a failure, it finishes in at most additional time for a total maximum time of .
Now that we have quantified the amount of time necessary for a node to successfully connect, we need an estimate for how many connections we can expect to be succesful. Similar to our previous analysis, this is dependent on the amount of competition between connection attempts sent to a single node. We begin with a useful graph theory definition.
For a graph , we define the degree weight of a node be the sum of the weights of all incoming edges, where the weight of each edge is . Formally:
We now prove a useful result about one-round random matchings in a bipartite graph that leverages our degree weight definition in its proof.