In this paper, we study the classical capacity problem in the mobile telephone model: an abstraction that models the peer-to-peer communication capabilities implemented in most commodity smartphone operating systems. The capacity of a network expresses how much sustained throughput can be maintained for a set of communication demands. We focus on three variations of the problem: pairwise capacity, in which nodes are divided into pairwise packet flows, broadcast capacity, in which a single source delivers packets to the whole network, and all-to-all capacity, in which all nodes deliver packets to the whole network.
For each variation we prove limits on the achievable throughput and analyze algorithms that match (or nearly match) these bounds. We study these results in both arbitrary networks and random networks generated with the process introduced by Gupta and Kumar in their seminal paper on wireless network capacity [gupta:2000]. Finally, we deploy our new techniques to largely resolve an open question from [newport:2017] regarding optimal one-shot gossip in the mobile telephone model. Below we summarize the problems we study and the results we prove, interleaving the relevant related work.
The Mobile Telephone Model.
The mobile telephone model (MTM), introduced by Ghaffari and Newport [ghaffari:2016], modifies the well-studied telephone model of wired peer-to-peer networks (e.g., [frieze1985shortest, giakkoupis2011tight, chierichetti2010rumour, giakkoupis2012rumor, fountoulakis2010rumor, giakkoupis2014tight]) to better capture the dynamics of standard smartphone peer-to-peer libraries. It is inspired, in particular, by the specific interfaces provided by Apple’s Multipeer Connectivity Framework [multipeer].
In this model, the network is modeled as an undirected graph , where the nodes in correspond to smartphones, and an edge indicates the devices corresponding to and are close enough to enable a direct peer-to-peer radio link. Time proceeds in synchronous rounds. As in the original telephone model, in each round, each node can either attempt to initiate a connection (e.g., place a telephone call) with at most one of its neighbors, or wait to receive connection attempts. Unlike the original model, however, a waiting node can accept at most one incoming connection attempt. This difference is consequential, as many of the celebrated results of the original telephone model depend on the nodes’ ability to accept an unbounded number of incoming connections (see [ghaffari:2016, daum:2016] for more discussion).111This behavior is particularly evident in studying PUSH-PULL rumor spreading in the telephone model in a star network topology. This simple strategy performs well in this network due to the ability of the points of the star to simultaneously pull the rumor from the center. In the mobile telephone model, by contrast, any rumor spreading strategy would be fundamentally slower due to the necessity of the center to connect to the points one by one. This restriction is motivated by the reality that standard smartphone peer-to-peer libraries limit the number of concurrent connections per device to a small constant (e.g., for Multipeer this limit is ). Once connected, a pair of nodes can participate in a bounded amount of reliable communication (e.g., transfer a constant number of packets/rumors/tokens).
Finally, the mobile telephone model also allows each node to broadcast a small -bit advertisement to its neighbors at the start of each round before the connection decisions are made. Most existing smartphone peer-to-peer libraries implement this scan-and-connect architecture. Notice, the mobile telephone model is harder than the original telephone model due to its connection restrictions, but also easier due to the presence of advertisements. The results is that the two settings are formally incomparable: each requires its own strategies for solving key problems.
In recent years, several standard one-shot peer-to-peer problems have been studied in the MTM, including rumor spreading [ghaffari:2016], load balancing [dinitz:2017], leader election [newport:2017], and gossip [newport:2017, newport:2019]. This paper is the first to study ongoing communication in this setting.
The Capacity Problem.
Capacity problems are parameterized with a network topology , and a flow set made up of pairs of the form (each of which is a flow), where indicates a source (sometimes called a sender), and indicates a set of destinations (receivers). For each flow , source is tasked with routing an infinite sequence of packets to destinations in . The throughput achieved by a given destination for a particular flow is the average number of packets it receives from that flow per round in the limit, and the overall throughput is the smallest throughput over all the destinations in all flows (see Section 2.2 for formal definitions). We study three different capacity problems, each defined by the different constraints they place on the flow set .
Results: Pairwise Capacity.
The pairwise capacity problem divides nodes into source and destination pairs in , i.e., the given flows are between pairs of nodes rather than from a source to a general destination set. We begin with pairwise capacity as it was the primary focus of Gupta and Kumar’s seminal paper on the capacity of the protocol and physical wireless network models [gupta:2000]. They argued that it provides a useful assessment of a network’s ability to handle concurrent communication.
We begin in Section 3.1 by tackling the following fundamental problem: given an arbitrary connected network topology graph and a flow set that divides the nodes in into sender and receiver pairs, is it possible to efficiently calculate a packet routing schedule that approximates the optimal achievable throughput? We answer this question in the affirmative by establishing a novel connection between pairwise capacity and the classical concurrent multi-commodity flow (MCF) problem. To do so, we first transform a given and into an instance of the MCF problem. We then apply an existing MCF approximation algorithm to generate a fractional flow that achieves a good approximation of the optimal flow in the network. Finally, we apply a novel rounding procedure to transform the fractional flow into a schedule. We prove that this resulting schedule provides a constant approximation of the optimal achievable throughput.
Inspired by Gupta and Kumar [gupta:2000], in Section 3.2 we turn our attention to networks and flow pairings that are randomly generated using the process introduced in [gupta:2000]. This process is parameterized with a network size and communication radius . It randomly places the nodes in a unit square and adds an edge between any pair of nodes within distance . The source and destination pairs are also randomly generated.
For every given size , we identify a connectivity threshold value , such that for any radius
, with constant probability the network generated by the above process forand includes a source with no path to its destination—trivializing the optimal achievable throughput to . We then prove that for every radius that is at least a sufficiently large constant factor larger than the threshold, there is a tight bound of on the optimal achievable throughput. These results fully characterize our algorithm from Section 3.1 in randomly generated networks.
Results: Broadcast Capacity.
Broadcast capacity is another natural communication problem in which a single source node is provided an infinite sequence of packets to deliver to all other nodes in the network. Solutions to this problem would be useful, for example, in a scenario where a large file is being distributed in a peer-to-peer network of smartphone users in a setting without infrastructure. In Section 4.1 we study the optimal achievable throughput for this problem in arbitrary connected graphs. To do so, we connect the scheduling of broadcast packets to existing results on graph toughness, a metric that captures a graph’s resilience to disconnection that was introduced by Chvátal [toughness] in the context of studying Hamiltonian paths.
In more detail, a graph has a -tree if there exists a spanning tree of with maximum degree . Let be the smallest such that has a -tree. This tree is also called a minimum degree spanning tree (MDST) of . Building on a result of Win [win:1989] that relates -trees to toughness, we prove that for any given with , there exists a subset of nodes such that removing from partitions the graph into at least connected components.
As we formalize in Section 4.1, because each node in can connect to at most one component per round (due to the connection restrictions of the mobile telephone model), rounds are required to spread each packet to all components, implying that no schedule achieves throughput better than .
In Section 4.2, we prove this bound tight by exhibiting a matching algorithm. The algorithm begins by constructing a -tree with using existing techniques; e.g., [fr94, dinitz:2019]. It then edge colors and uses the colors as the foundation for a TDMA schedule of length that allows nodes to simulate the more powerful CONGEST model in which each node can connect with every neighbor in a round. In the CONGEST model, a basic pipelined broadcast provides constant throughput. When combined with the simulation cost the achieved throughput is an asymptotically optimal .
It is straightforward for a centralized algorithm to calculate this schedule in polynomial time, but in some cases a pre-computation of this type might be impractical, or require too high of a setup cost.222In the mobile telephone model, all nodes can learn the entire network topology in rounds and then run a centralized algorithm locally to determine their routing behavior. Though this setup cost is averaged out when calculating throughput in the limit, it might be desirable to minimize it in practice. With this in mind, we also provide a distributed version of this algorithm that converges to throughput in rounds, where is the diameter of the spanning tree and hides polylog factors. The algorithm further converges to an optimal throughput after no more than total rounds—providing a trade-off between setup cost and eventual optimality.
Finally, in Section 4.3, we study the performance of our algorithm in networks generated randomly using the Gupta and Kumar process summarized above. We prove that for any communication radius sufficiently larger than the connectivity threshold, the network is likely to include an -tree, enabling our algorithms to converge to constant throughput. This result indicates that in evenly distributed network deployments the mobile telephone model is well-suited for high performance broadcast.
Results: All-to-All Capacity.
All-to-all capacity generalizes broadcast capacity such that now every node is provided an infinite sequence of packets it must deliver to the entire network. Solutions to this problem would be useful, for example, in a local multiplayer gaming scenario in which each player needs to keep track of the evolving status of all other players connected in a peer-to-peer network.
Clearly, separate instances of our broadcast algorithm from Section 4.2, one for each of the nodes as the broadcast source, can be interleaved with a round robin schedule to produce throughput. In Section 5, we draw on the same graph theory connections as before to prove that this result is tight for all-to-all capacity. We then provide a less heavy-handed distributed algorithm for achieving this throughput. Instead of interleaving different broadcast instances, it executes distinct instances of all-to-all gossip, one for each packet number, using a flood-based strategy on a low degree spanning tree. Finally, we apply the random graph analysis from Section 4.3 to establish that for sufficiently large communication radius, with high probability, the randomly generated graph supports -throughput, which is trivially optimal in the sense that a receiver can receive at most one new packet per round in our model.
New Results on One-Shot Gossip.
As we detail in Section 5.4, our results on all-to-all capacity imply new lower and upper bounds on one-shot gossip in the mobile telephone model. From the lower bound perspective, they imply that gossiping in graph in the mobile telephone model requires rounds. From the upper bound perspective, when we carefully account for the costs of our routing algorithm applied to spreading only a single packet from each source, we solve the one-shot problem with high probability in the following number of rounds:
where is the diameter of . This algorithm is asymptotically optimal in any graph with and (where is the constant from the polylog in the MDST construction time), which describes a large family of graphs. For all other graphs the solution is at most a polylog factor slower than optimal. This is the first known gossip solution to be optimal, or within log factors of optimal, in all graphs, largely answering a challenge presented by [newport:2017].
Smartphone operating systems include increasingly robust support for opportunistic device-to-device communication through standards such as Apple’s Multipeer Connectivity Framework [multipeer], Bluetooth LE [gomez2012overview], and WiFi Direct [camps2013device]. Though the original motivation for these links was to support information transfer among a small number of nearby phones, researchers are beginning to explore their potential to enable large-scale peer-to-peer networks. Recent work, for example, uses smartphone peer-to-peer networking to provide disaster response [suzuki2012soscast, reina2015survey, lu2016networking], circumvent censorship [firechat], extend internet access [aloi2014spontaneous, oghostpot], support local multiplayer gaming [mark2015peer] and improve classroom interaction [holzer2016padoc].
It remains largely an open question whether or not it will be possible to build large-scale network systems on top of smartphone peer-to-peer links. As originally argued by Gupta and Kumar [gupta:2000], bounds for capacity problems can help resolve such questions for a given network model by establishing the limit to their ability to handle ongoing and concurrent communication. The results in this paper, as well as the novel technical tools developed to prove them, can therefore help resolve this critical question concerning this important emerging network setting.
Here we define our model, the problem we study, and some useful mathematical tools and definitions.
The mobile telephone model describes a smartphone peer-to-peer network topology as an undirected graph . The nodes in correspond to the smartphone devices, and an edge implies that the devices corresponding to and are within range to establish a direct peer-to-peer radio link. We use to indicate the network size.
Executions proceed in synchronous rounds labeled , and we assume all nodes start during round . At the beginning of each round, each node selects an advertisement of size at most bits to broadcast to its neighbors in . After the advertisement broadcasts, each node can either send a connection invitation to at most one neighbor, or wait to receive invitations. A node receiving invitations can accept at most one, forming a reliable pairwise connection. It follows from these constraints that the set of connections in a given round forms a matching.
Once connected, a pair of nodes can perform a bounded amount of reliable communication. For the capacity problems studied in this paper, we assume that a pair of connected nodes can transfer at most one packet over the connection in a given round. We treat these packets as black boxes that can only be delivered in this manner (e.g., you cannot break a packet into pieces, or attempt to deliver it using advertisement bits).
We assume when running a distributed algorithm in this model that each computational process (also called a node
) is provided a unique ID that can fit into its advertisement and an estimate of the network size. It is provided no othera priori information about the network topology, though any such node can easily learn its local neighborhood in a single round if all nodes advertise their ID.
In this paper we measure capacity as the achievable throughput for various combinations of packet flow and network types. We begin by providing a general definition of throughput that applies to all settings we study. This definition makes use of an object we call a flow set, which is a set (for some ) where each and (for node set ). For a given flow set , each describes a packet flow of type ; i.e., source is tasked with sending packets to all the destinations in set . We refer to the packets from as -packets.
A schedule for a given and describes a movement of packets through the flows defined by . Formally, a schedule is an infinite sequence of directed matchings, on , such that the edges in each are labelled by packets, where we define a packet as a pair with and (i.e., is the ’th packet of type ). We require that the packet labels for a schedule satisfy the property that if edge in is labelled with packet , then there is a path in from to where all edges on the path are labelled with . (It is easy to see by induction that this corresponds precisely to the intuitive notion of packets moving through a mobile telephone network). We say that a packet is received by a node in round if there is an edge which is labelled . A packet is delivered by round if every receives it in some round with .
Given a schedule for a graph and flow set , we can define the throughput achieved by the slowest rate, indicated in packets per round, at which any of the flows in are satisfied in the limit. Formally:
Fix a schedule defined with respect to network topology graph and flow set . We say achieves throughput with respect to and , if there exists a convergence round , such that for every and every packet type :
where is the largest such that for every , packet has been delivered by round .
The above definition of throughput concerns performance in the limit, since can be arbitrarily large. In some cases, though, we might also be concerned with how quickly we achieve this limit. Our notion of convergence round allows us to quantify this, so we will provide bounds on the convergence round where relevant.
Many of the results in this paper concern algorithms that produce schedules. Our centralized algorithms take and as input and efficiently produce a compact description of an infinite schedule (i.e., an infinitely repeatable finite schedule). Our distributed algorithms assume a computational process running at each node in , and for each , the source is provided an infinite sequence of packets to deliver to . An execution of such a distributed algorithm might contain communication other than the flow packets provided as input; e.g., the algorithm might distributedly (in the mobile telephone model) compute a routing structure to coordinate efficient packet communication. However, a unique schedule can be extracted from each such execution by considering only communication corresponding to the flow packets. (It is here that we leverage the model assumption that the set of connections in a given round is a matching and each connection can send at most one flow packet per round.)
While our definition of throughput is for schedules and not algorithms, we will say that an algorithm achieves throughput if it results in a schedule that achieves throughput .
In the sections that follow, we consider three different types of capacity: pairwise, broadcast, and all-to-all. Each capacity type can be formalized as a set of constraints on the allowable flow sets. For each capacity type we study achievable throughput with respect to both arbitrary and random network topology graphs. In the arbitrary case, the only constraints on the graph is that it is connected. For the random case, we must describe a process for randomly generating the graph. To do so, we use the approach introduced for this purpose by Gupta and Kumar [gupta:2000]: randomly place nodes in a unit square, and then add an edge between all pairs within some fixed radius. Formally:
For a given real value radius , , and network size , the GK network generation process randomly generates a network topology as follows:
Let . Place each of the nodes in uniformly at random in a unit square in the Euclidean plane.
Let , where is the Euclidean distance metric.
We will use the notation to denote that is a random graph generated by the process. When studying a specific definition of capacity with respect to a network randomly generated with the process, it is necessary to specify how the flow set is generated. Because these details differ for each of the three capacity definitions, we defer their discussion to their relevant sections.
2.3 Mathematical Preliminaries
Several proofs will make use of the following Chernoff bound form:
are independent random variables. Letand . Then,
Graph Theory Preliminaries.
We begin with some basic definitions. Fix some connected undirected graph . We define to be the number of components in . In a slight abuse of notation, we define , for , to be the graph defined when we remove from the nodes in and their adjacent edges. For a fixed integer , we say has a -tree if there exists a spanning tree in with maximum degree . Finally, let be the smallest such that has a -tree. That is, describes the maximum degree of the minimum degree spanning tree (MDST) in .
Several of our capacity results build on a graph metric called toughness, introduced by Chvátal [toughness] in the context of studying Hamiltonian paths. It is defined as follows:
An undirected graph has toughness if is the largest number such that for every : if , then .
Intuitively, to have toughness means that you need to remove nodes for every component you hope to create. Win [win:1989] formalized this by establishing a link between toughness and -trees:
[[win:1989]] For any , if , then has a -tree.
Win’s theorem captures the intuition that a small toughness indicates a small number of strategic node removals can generate a large number of components. This in turn implies the existence of a spanning tree containing some high degree nodes (i.e., the nodes whose removal creates many components). We formalize this intuition with the following straightforward corollary of Win’s theorem:
Fix an undirected graph and degree . If , then there exists a non-empty subset of nodes such that .
Since , the contrapositive of Thm. 2.3 implies that . By the definition of toughness, there exists an such that . For this set, . ∎
3 Pairwise Capacity
In their seminal paper [gupta:2000], Gupta and Kumar approached the question of network capacity by considering the maximum throughput achievable for a collection of disjoint pairwise flows, each consisting of a single source and destination. They studied achievable capacity in both arbitrary networks as well as random networks. In this section, we apply this approach to the mobile telephone model.
To do so, we formalize the pairwise capacity problem as the following constraint on the allowable flow sets (see Section 2.2): for every pair , it must be the case that (i.e., ), and neither nor shows up in any other pair in .
3.1 Arbitrary Networks
We begin by designing algorithms that (approximate) the maximum achievable throughput in an arbitrary network. For now we will not focus on the convergence time, since our definition of capacity applies in the limit, so we describe the following as a centralized algorithm (the time required for each node to gather the full graph topology and run this algorithm locally to generate an optimal routing schedule is smoothed out over time). But as usual when considering centralized algorithms, we will care about the running time.
Formally, we define the Pairwise Capacity problem to be the optimization problem where we are given a graph and a pairwise flow set , and are asked to output a description of an (infinite) schedule which maximizes the throughput. Our algorithm will in particular output a finite schedule which is infinitely repeated. Our approach is to establish a strong connection between multi-commodity flow and optimal schedules, and then apply existing flow solutions as a step toward generating a near optimal solution for the current network. In other words, we give an approximation algorithm for Pairwise Capacity via a reduction to a multi-commodity flow problem.
There is a (centralized) algorithm for Pairwise Capacity that achieves throughput which is a -approximation of the optimal throughput, for any . The convergence time is and the running time is .
In the maximum concurrent multi-commodity flow (MCMF) problem, we are given a triple , where is a digraph, is collection of node-pairs (each representing a commodity), and are flow capacities on the edges. Let be the number of commodities. The output is a collection of flows satisfying conservation and capacity constraints. Namely, for each flow and for each vertex where , the flow into a node equals the flow going out: . Also, the flow through each edge is upper bounded by its capacity: . Let be the value of flow , or the total flow of commodity leaving its source. The value of the total flow is , and our goal is to maximize . We refer to as an MCMF flow and the constituent commodity flows as subflows.
The MCMF problem can be solved in polynomial-time by linear programming. There are also combinatorial approximation schemes known, and our version of the problem can be approximated within a-factor in time [Madry2010].
We first show how to round an MCMF flow to use less precision while limiting the loss of value. We say that a MCMF flow is -rounded if the flow of each commodity on each edge is an integer multiple of : , for all , and all edges . We show how to produce a rounded flow of nearly the same value.
Let be a MCMF flow and be a number. There is a rounding of to a -rounded flow with value at least , and it can be generated in polynomial time.
We focus on each subflow . By standard techniques, each subflow can be decomposed into a collection of paths and values , with , such that for each edge . Let , for each , and observe that . We form the -rounded flow by , for each edge . It is easily verified that conservation and capacity constraints are satisfied. By the bound on , it follows that the value of the rounded flow is bounded from below by . The value of each flow is trivially bounded from below by (which is achieved by sending of each commodity flow along a single path). Thus, . ∎
We now turn to the reduction of Pairwise Capacity to MCMF. Given and , along with a parameter , we form the flow network as follows. The undirected graph is turned into a digraph with two copies of each vertex: and edges . The source/destination pairs carry over: . Finally, capacities of edges in are and , where is the number of source/destination pairs in in which occurs. Observe that there is a one-to-one correspondence between simple paths in and in (modulo the in/out version of the start/end node).
The throughput of any schedule on is at most , where is the largest value such that has MCMF flow of value .
Let be a mobile telephone schedule and let be its throughput. We want to show that has MCMF flow of value ; this is sufficient to imply the lemma. We assume that packets flow along simple paths, and we achieve that by eliminating loops from paths, if necessary. By the throughput definition, there is a round such that for every round and every source/destination pair , the number of -packets delivered by round is at least . Let be the first -packets delivered (necessarily by round ), for each type , and let . For each edge and pair , let be the number of packets in that passed through , from to . Also, for a vertex , let denote the number of -packets originating at , i.e., if and otherwise. Similarly, let be the number of -packets with as its destination. Finally, let be the number of packets in that flow through , but did not originate or terminate at , and observe that .
Define the collection of functions where for each , , for each edge , and , for each vertex in . Observe that the flow corresponds to twice the number of -packets going from to (scaled by factor ). The flow from to corresponds to the number of packets in coming into plus the number of those going out of (scaled by factor ), counting those that go through twice, but those originating or terminating at only once. We claim that is a valid MCMF flow in of value , which implies the lemma. Let () be the amount of type- flow originating (terminating) at , respectively.
First, to verify flow conservation at nodes, consider a type , and observe first that all packets in start at the source and end at the destination .
That is, the flow from each node equals the flow coming in plus the flow generated at the node (noting also that no flow terminates at the node). Similarly, the flow into equals the flow terminating at the node plus the node going out:
Second, to verify capacity constraints, observe that if is the number of packets that flow through node , then
since needs to handle flowing-through packets in two separate rounds and it can only process a single packet in a round. Thus, the flow through is bounded by
satisfying the capacity constraints.
Finally, it follows directly from the definition of (or ) that the flow value is . ∎
To prove Theorem 3.1 we need to introduce edge multicoloring.
Given a graph and a color requirement for each edge . An edge multicoloring of is a function that satisfies the following: a) if are adjacent then , and b) , for each edge . The number of colors used is , the size of the support for .
We shall use the follow result on edge multicolorings.
[Shannon [Shannon1949]] Given a graph and a color requirement for each edge , there is a polynomial-time algorithm that edge multicolors using at most colors, where .
We can now prove Theorem 3.1.
Proof of Theorem 3.1.
Let be a given Pairwise Capacity instance and let . We perform binary search to find a value such that: a) An -approximate MCMF algorithm produces flow of value at least on , and b) The same does not hold for . The resulting flow is then of value at least . Recall that is the number of commodities, and so .
We then form an edge multicoloring instance on as follows. Each edge requires colors, where and . The weighted degree of each node is then , by node capacity constraints. We apply the algorithmic version of Shannon’s Theorem 3.1 to edge multicolor with at most colors. This induces an initial schedule of length , which is then repeated as needed. Within each rounds, -packets depart from its source .
Let . Consider the situation after round . Observe that each packet is forwarded at least once during each rounds, and thus it is delivered within rounds after it is transmitted from its source, since each path used is simple. Thus, the total number of type- packets that remain in the system in the end is at most a -fraction of the delivered packets. Averaged over the rounds gives throughput of
Hence, the throughput achieved is at least
By Lemma 3.1, the throughput is then -approximation of optimal.
The computation performed is dominated by the application of Shannon’s algorithm, which runs in time , where is the number of multiedges and is the maximum weighted degree. Here, . Hence, the number of computational steps is at most . The convergence time is . ∎
We note that the factor cannot be avoided in a reduction to flow. Consider the graph on six vertices and edges . The optimal throughput is , with respect to . This corresponds to the directed graph on nine nodes: and edges , and three subflows: . Then, , where , has flow of value 1.
3.2 Random Networks
We now consider achievable throughput for the pairwise capacity problem in networks randomly generated with the process defined in Section 2.2. Following the lead of the original Gupta and Kumar capacity paper [gupta:2000], we assume the flow sets are also randomly generated with uniform randomness and contain all the nodes (i.e., every node shows up as a source or destination). A minor technical consequence of this definition is that it requires us to constrain our attention to even network sizes.
We begin in Section 3.2.1 by identifying a threshold value for the radius below which the randomly generated network is likely to be disconnected, trivializing the achievable throughput to . In Sections 3.2.2 and 3.2.3, we then prove that for any radius value that is at least a sufficiently large constant factor greater than the threshold, with high probability in , the optimal achievable throughput is in .
3.2.1 Connectivity Threshold
When analyzing networks and flows generated by the network generation process, we must consider the radius parameter . If is too small, then we expect a network in which some sources are disconnected from their corresponding destinations, making the best achievable throughput trivially . Here we study a connectivity threshold value , defined with respect to a network size and a constant fraction . We prove that for any , with probability at least , given a network generated by and a random pairwise flow set , there exists at least one pair in that is disconnected.
There is some constant so that for every sufficiently large even network size and radius , if and is a random pairwise flow set, then with probability at least there exists such that is disconnected from in .
At a high level, to prove this theorem we divide the unit square into a grid consisting of boxes of side length , and then group these boxes into regions made up of collections of boxes. If a given region has a node in the center box, and all its other boxes are empty, then is disconnected from any node not in its own box. Our proof calculates that for a sufficiently small constant fraction used in the definition of the connectivity threshold, with probability at least , there will be a node such that is isolated as described above, and is part of a source/destination pair with another node located in a different box.
Given this setup, the main technical complexity in the proof is carefully navigating the various probabilistic dependencies. One place where this occurs is in proving the likelihood of empty regions. For sufficiently small values, the expected number of non-empty regions is non-zero, but we cannot directly concentrate on this expectation due to the dependencies between emptiness events. These dependencies, however, are dispatched by leveraging the negative association between the indicator variables describing a region’s emptiness (e.g., if region is not empty, this increases the chance that region is empty). In particular, we will apply the following results concerning negative association derived in [daum:2012] based on the more general results of [dubhashi:1998]:
[[daum:2012, dubhashi:1998]] Consider an experiment in which weighted balls are thrown into bins according to some distribution. Fix some , and let
be the indicator random variable defined such thatiff there are no more than balls in bin . The variables are negatively associated, and therefore standard Chernoff bounds apply to their sum.
We can proceed to the main proof:
Proof (of Theorem 3.2.1)..
We consider the network generated with the threshold connectivity value defined in the theorem statement. Clearly if the network is disconnected for this radius it is also disconnected for smaller radii. We will show the theorem claim holds for for sufficiently large and .
We begin by structuring the unit square into which nodes are randomly placed by the process. First, we divide the unit square into a grid of square boxes of side length (ignore left over space). We then partition these boxes into regions made up of collections of boxes (ignore left over boxes). Finally, we label these regions , where
For each region , let refer to the center box of the pattern of boxes that defines the region. We call the remaining boxes the boundary boxes for region . We now calculate the probability that process places nodes such that boundary boxes of a given region are all empty.
By the definition of the process, the probability that a given node is placed in a given box is equal to the total area, , of the box. Therefore, the probability is not placed in any of the boundary boxes of a given region is .
Pulling these pieces together with the fact that , it follows that the probability that no node is placed in the boundary boxes of a given region is lower bounded as:
where the second step follows from the well-known inequality that for any (for sufficiently large , it is clear that ). Because we assumed , we can further simplify:
We now lower bound the probability that some region has empty boundary boxes. To do this, we first define the random indicator variables , where iff the boundary boxes of region are empty. Let . We want to lower bound the probability that . By linearity of expectations,
Because each node is equally likely to be placed in each region, we know from Theorem 3.2.1 (with ) that the variables , are negatively associated. Therefore the Chernoff bounds from Theorem 2.3 apply to . In particular, it follows that the probability that is upper bounded by:
For our fixed , it follows that for sufficiently large , two things are true: (and therefore ), and this probability is upper bounded by . Therefore, for sufficiently large , the probability that there are no regions with empty boundary boxes is at most .
Conditioned on the event that a given region has empty boundary boxes, we want to now bound the probability that there exists a source/destination pair such that is in and is not.
For a given , this occurs with probability , where is the probability that is in and is the probability that is not in . Given that (where is the area contained in a box) and is clearly greater than , we crudely bound this product as
So the probability that this splitting event fails to occur for all pairs in is upper bounded by
As before, for our fixed , for sufficiently large this probability is upper bounded by .
We have shown the following two bounds: (1) the probability that there are no regions with empty boundary boxes is at most ; and (2) the probability that given a region with empty boundary boxes, that there are no pairs split by the region, is also at most . We can combine these events with a union bound to establish that the probability that at least one of these two events fails is less than , satisfying the theorem statement. ∎
3.2.2 Bound on Achievable Throughput
In the previous section, we identified a radius threshold below which a randomly generated network is likely to disconnect a source and destination, reducing the achievable throughput to a trivial . Here we study the properties of the networks generated with radius values on the other side of this threshold. In particular, we show that for any radius , with high probability, the randomly generated network and flow set will allow an optimal throughput bounded by . The intuition for this argument is that if nodes are evenly distributed in the unit square, a constant fraction of senders will have to deliver packets from one half of the square to the other, necessarily requiring many packets to flow through a small column in the center of the square, bounding the achievable throughput.
For every sufficiently large even network size and radius , given a network and a random pairwise flow set , the throughput of every schedule (w.r.t. and ) is with high probability. To build up to this proof, we consider a series of helper lemmas. These results assume that we divide the unit square into three columns (regions of height 1) such that the center region has width and the two outer regions width . We first show that, in expectation, there are many source/destination pairs such that all paths between the source and destination require a node in the center region to send a packet to a node in an outer region. Slightly more formally, we say that a source/destination pair requires a node in the center region if every path from to in contains at least one node from the center region.
For the lemmas that follow, since the theorem is trivially true for constant , we assume without loss generality, that is relatively small (e.g., ).
For a particular source/destination pair , the probability that requires a node from the center region is at least .
Note that requires a node in the center region if one of the following two disjoint events occur: and are in different outer regions, or is in the center region but is in an outer region. The first event is sufficient since the width of the center region means that there are no edges between the two outer regions, while the second event is sufficient since every path includes .
The first event occurs with probability , and the second event occurs with probability . Thus the total probability that every path includes an outgoing edge from a node in the center region is at least . ∎
Next we relate this probability to the number of such source/destination pairs. With very high probability, the number of source/destination pairs in that require a node in the center region is at least .
For each source/destination pair , let be an indicator random variable for the union of the two events analyzed in Lemma 3.2.2, such that . Observe that, clearly, these events are independent and let denote the total number of pairs where . By linearity of expectations, we know that . So the Chernoff bound from Theorem 2.3 implies that
Therefore, with very high probability, the number of source/destination pairs that meet the conditions of Lemma 3.2.2 is for . Furthermore, since by Lemma 3.2.2 each of these source/destination pairs requires a node in the center region, the number of pairs as described by the lemma statement is also . ∎
Now that we have successfully lower bounded the number of source/destination pairs that require a node in the center region to send a packet to a node in an outer region, we need an estimate for how many nodes in the center region exist to send these packets at one time.
With high probability, there are nodes in the center region.
Let be a random variable denoting the number of vertices in the center region. Each node is put into the center region independently with probability , and thus . Since the placement of each node is independent, we can use the Chernoff bound from Theorem 2.3 to get that . Thus with very high probability, there are at most nodes in the center region. ∎
We now have everything we need to upper bound the pairwise throughput.
Proof (of Theorem 3.2.2)..
From Lemma 3.2.2 we know that with high probability that there are source/destination pairs that require one of the nodes in the center region. Since each of these nodes can send at most one packet per round by the constraints of the mobile telephone model, by round at most packets can be delivered. Therefore, on average for each source/destination pair , the number of packets delivered by round is . Thus in any schedule there must exist some so that at round , only packets from have been delivered to , and hence the throughput is only . ∎
3.2.3 Tightness of the Throughput Bound
In Section 3.2.2, we proved an upper bound of on the achievable throughput in a network generated by , for , and random pairwise flows. Here we show this result is tight by showing how to produce a schedule that achieves throughput in with respect to a random and . Formally: There exists a constant such that, for any sufficiently large network size and radius , if and is a random pairwise flow set, then with high probability in there exists a schedule that achieves throughput in with respect to and .
At a high level, our argument divides the unit square into box of side length . We prove that with high probability, both nodes and pairwise demands are evenly distributed among the boxes. This allows a schedule that efficiently moves many packets in parallel up and down columns to the row of their destination, and then moves these packets left and right along the rows to reach their destination. The time required for a given packet to make it to its destination is bounded by the column and row length of , yielding an average throughput in . The core technical complexity of this argument is the careful manner in which packets are moved onto and off a set of parallel paths while avoiding more than a small amount of congestion at any point in their routing.
Our approach is to isolate the probabilistic elements of the proof. To do so, we need some preliminary definitions to help structure our argument. We begin by fixing a canonical way of covering the unit square into which the process places nodes with a grid.