# Greedy Gossip with Eavesdropping

This paper presents greedy gossip with eavesdropping (GGE), a novel randomized gossip algorithm for distributed computation of the average consensus problem. In gossip algorithms, nodes in the network randomly communicate with their neighbors and exchange information iteratively. The algorithms are simple and decentralized, making them attractive for wireless network applications. In general, gossip algorithms are robust to unreliable wireless conditions and time varying network topologies. In this paper we introduce GGE and demonstrate that greedy updates lead to rapid convergence. We do not require nodes to have any location information. Instead, greedy updates are made possible by exploiting the broadcast nature of wireless communications. During the operation of GGE, when a node decides to gossip, instead of choosing one of its neighbors at random, it makes a greedy selection, choosing the node which has the value most different from its own. In order to make this selection, nodes need to know their neighbors' values. Therefore, we assume that all transmissions are wireless broadcasts and nodes keep track of their neighbors' values by eavesdropping on their communications. We show that the convergence of GGE is guaranteed for connected network topologies. We also study the rates of convergence and illustrate, through theoretical bounds and numerical simulations, that GGE consistently outperforms randomized gossip and performs comparably to geographic gossip on moderate-sized random geometric graph topologies.

## Authors

• 2 publications
• 4 publications
• 22 publications
• 21 publications
• ### Privacy-Preserving Average Consensus via State Decomposition

Average consensus underpins key functionalities of distributed systems r...
02/25/2019 ∙ by Yongqiang Wang, et al. ∙ 0

• ### Distributed Processes and Scalability in Sub-networks of Large-Scale Networks

Performance of standard processes over large distributed networks typica...
02/14/2019 ∙ by Abhinav Mishra, et al. ∙ 0

• ### Asynchronous Local Construction of Bounded-Degree Network Topologies Using Only Neighborhood Information

We consider ad-hoc networks consisting of n wireless nodes that are loca...
11/28/2018 ∙ by Erdem Koyuncu, et al. ∙ 0

• ### Network Topology and Communication-Computation Tradeoffs in Decentralized Optimization

In decentralized optimization, nodes cooperate to minimize an overall ob...
09/26/2017 ∙ by Angelia Nedić, et al. ∙ 0

• ### LP WAN Gateway Location Selection Using Modified K-Dominating Set Algorithm

The LP WAN networks use gateways or base stations to communicate with de...
10/09/2020 ∙ by Artur Frankiewicz, et al. ∙ 0

• ### Distributed estimation from relative measurements of heterogeneous and uncertain quality

This paper studies the problem of estimation from relative measurements ...
10/24/2017 ∙ by Chiara Ravazzi, et al. ∙ 0

• ### Cluster-and-Conquer: When Randomness Meets Graph Locality

K-Nearest-Neighbors (KNN) graphs are central to many emblematic data min...
10/22/2020 ∙ by George Giakkoupis, et al. ∙ 5

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction and Background 111Portions of this work were presented in [1], [2], [3].

Distributed consensus is recognized as a fundamental problem of distributed control and signal processing applications (see, e.g., [4, 5, 6, 7, 8, 9, 10] and references therein). The prototypical example of a consensus problem is computation of the average consensus: for a network of nodes, initially each node has a scalar data value, , and the goal is to find a distributed algorithm that asymptotically computes the average, at every node . Such an algorithm can further be used for computing linear functions of the data and can be generalized for averaging vectorial data.

One of the algorithms proposed for solving the average consensus problem is distributed averaging [11]. In distributed averaging, every node in the network broadcasts its information at each iteration so that neighboring nodes can receive and use this information for their updates. However, with this scheme the speed of information diffusion across the network is slow for topologies used to model wireless mesh and sensor networks. The information at each node typically does not change much from iteration to iteration. Hence, the broadcast medium is not efficiently used. Recently, gossip algorithms have gained attention for the computation of average consensus [12, 10]. In contrast to distributed averaging, gossip algorithms allow only two neighboring nodes to communicate and exchange information at each iteration. Restricting all information exchange to be local in this fashion is attractive from the point of view of simplicity and robustness (e.g., to changing topology and unreliable network conditions).

In this paper we propose a new randomized gossip algorithm, greedy gossip with eavesdropping (GGE), for average consensus computation. Unlike previous randomized gossip algorithms, which perform updates completely at random, GGE takes advantage of the broadcast nature of wireless communications and implements a greedy neighbor selection procedure. We assume a broadcast transmission model such that all neighbors within range of a transmitting node receive the message. Thereby, in addition to keeping track of its own value, each node tracks its neighbors values by eavesdropping on their transmissions. At each iteration, the activated node uses this information to greedily choose the neighbor with which it will gossip, selecting the neighbor whose value is most different from its own. Accelerating convergence in this myopic way does not bias computation and does not rely on geographic location information, which may change in networks of mobile nodes.

Although GGE is a powerful yet simple variation on gossip-style algorithms, analyzing its convergence behavior is non-trivial. The main reason is that each GGE update depends explicitly on the values at each node (via the greedy decision of with which neighbor to gossip). Thus, the standard approach to proving convergence to the average consensus solution (i.e., expressing updates in terms of a linear recursion and then imposing properties on this recursion) cannot be applied to guarantee convergence of GGE. To prove convergence, we demonstrate that GGE updates correspond to iterations of a distributed randomized incremental subgradient optimization algorithm. Similarly, analysis of the convergence rate of GGE requires a different approach than the standard approach of examining the mixing time of a related Markov chain. We develop a bound relating the rate of convergence of GGE to the rate of standard randomized gossip. The bound indicates that GGE always converges faster than randomized gossip, a finding supported by simulation results. We also provide a worst-case bound on the rate of convergence of GGE. For other gossip algorithms the rate of convergence is generally characterized as a function of the second largest eigenvalue of a related stochastic matrix. In the case of GGE, our worst-case bound characterizes the rate of convergence in terms of a constant that is completely determined by the network topology. We investigate the behavior of this constant empirically for random geometric graph topologies and derive lower bounds that provide some characterization of its scaling properties.

### I-a Background and Related Work

Randomized gossip was proposed in [12] as a decentralized asynchronous scheme for solving the average consensus problem. At the th iteration of randomized gossip, a node is chosen uniformly at random. It chooses a neighbor, , randomly, and this pair of nodes “gossips”: and exchange values and perform the update , and all other nodes remain unchanged. One can show that under very mild conditions on the way a random neighbor, , is drawn, the values converge to at every node as [11]. Because of the broadcast nature of wireless transmission, other neighbors overhear the messages exchanged between the active pair of nodes, but they do not make use of this information in existing randomized gossip algorithms.

The convergence rate of randomized gossip is characterized by relating the algorithm to a Markov chain [12]. The mixing time of this Markov chain is closely related to the averaging time of the gossip algorithm, and therefore defines the rate of convergence. For certain types of graph topologies, the mixing times are small and convergence of the gossip algorithm is fast. For example, in the case of a complete graph, the algorithm requires iterations to converge. However topologies such as random geometric graphs [13] or grids are more realistic for wireless applications. Boyd et al. [12] prove that for random geometric graphs, randomized gossip requires transmissions to approximate the average consensus well333Throughout this paper, when we refer to randomized gossip we specifically mean the natural random walk version of the algorithm, where the node is chosen uniformly from the set of neighbors of at each iteration. For random geometric graph topologies, which are of most interest to us, Boyd et al. [12]

prove that the performance of the natural random walk algorithm scales order-wise identically to that of the optimal choice of transition probabilities, so there is no loss of generality.

.

Motivated by the slow convergence of randomized gossip, Dimakis et al. introduced geographic gossip in [10]. Geographic gossip enables information exchange over multiple hops with the assumption that nodes have the knowledge of their geographic locations and the locations of their neighbors. It has been shown that long-range information exchange improves the rate of convergence to for random geometric graphs. However, geographic gossip involves overhead due to localization and geographic routing. Furthermore, the network needs to provide reliable two-way transmission over many hops. Otherwise, messages which are lost in transit will result in biasing the average consensus computation.

Recently, other fast gossiping algorithms have also been proposed. Most related is the work of Li and Dai [14], and Jung et al. [15]. Both approaches are based on directing the exchange of information across the network by constructing lifted Markov chains using knowledge of the geographic locations of nodes. As an extension to geographic gossip, Benezit et al. [16] have recently proposed averaging along paths, an algorithm which converges in transmissions. All of these approaches rely on geographic information and thus are not suitable to scenarios where nodes are mobile or location information is not available. The focus of our work is on providing fast and communication-efficient computation that exploits broadcast transmissions rather than geo-location information to gossip quickly.

Aysal et al. have proposed broadcast gossip, a consensus algorithm that also makes use of the broadcast nature of wireless networks [17, 18]

. At each iteration, a node is activated uniformly at random to broadcast its value. All nodes within transmission range of the broadcasting node calculate a weighted average of their own value and the broadcasted value, and they update their local value with this weighted average. Broadcast gossip does not preserve the network average at each iteration. It achieves a low variance (i.e., rapid convergence), but introduces bias: the value to which broadcast gossip converges can be significantly different from the true average.

Sundhar Ram et al. introduced a general class of incremental subgradient algorithms for distributed optimization in [19]. In this study, the effects of stochastic errors (e.g., due to quantization) on the convergence of consensus-like distributed optimization algorithms are investigated. Convergence of their algorithm is guaranteed under certain conditions on the errors, but the convergence rates are not characterized.

Nedić and Ozdaglar have also developed a distributed form of incremental subgradient optimization that generalizes the consensus framework [20]. Our problem formulation is not as general as theirs, but with the specific formulation addressed in this paper we achieve stronger results. In particular, our cost function has a specific form and, by exploiting it, we are able to guarantee convergence to an optimal solution and obtain tight bounds on the rate of convergence as a function of the network topology.

### I-B Paper Organization

The remainder of this paper is organized as follows. Section II introduces the formal definition of the algorithm. Section III presents two bounds; the first relates the performance of GGE to randomized gossip and indicates that GGE always outperforms randomized gossip, and the second is a worst-case upper bound on the rate of convergence of GGE in terms of a topology-dependent constant. Results from numerical simulations are presented in Section IV. Motivated by these results we provide a multi-hop extension to our algorithm in Section V. Section VI summarizes the contributions of the paper and discusses future work.

## Ii Greedy Gossip with Eavesdropping (GGE)

We consider a network of nodes and represent network connectivity as a graph, , with vertices , and edge set such that if and only if nodes and directly communicate. We assume that communication relationships are symmetric and that the graph is connected. Let denote the set of neighbors of node (not including itself). Each node in the network has an initial value , and the goal of the gossip algorithm is to use only local broadcast exchanges to arrive at a state where every node knows the average . To initialize the algorithm, each node sets its gossip value to .

At the th iteration of GGE, a node is chosen uniformly at random from . (This can be accomplished using the asynchronous time model described in [21], where each node “ticks” according to a Poisson clock with rate 1.) Then, identifies a neighboring node satisfying

 tk∈argmaxt∈Nj{12(xsk(k−1)−xt(k−1))2}, (1)

which is to say, identifies a neighbor that currently has the most different value from its own. This choice is possible because each node maintains not only its own local variable, , but also a copy of the current values at its neighbors, , for . When has multiple neighbors whose values are all equally (and maximally) different from ’s, it chooses one of these neighbors at random. Then and perform the update

 (2)

while all other nodes hold their values at . Finally, the two nodes, and , broadcast these new values so that their neighbors have up-to-date information.

Note that one GGE iteration could be accomplished with just two transmissions since already knows the values at its neighbors: one broadcast from to notifying it of the change and simultaneously announcing the new value to all of ’s neighbors, and one broadcast from to its neighbors to echo the new value to them. However, in networks with unreliable transmission or systems where nodes periodically shut off their radios to conserve energy, may miss some transmissions from its neighbors, and thus may not always have accurate information about their values. In this case, mistakes in calculation at in the two-transmission scheme just described would introduce errors, biassing the consensus computation. To make our algorithm more robust and address the case where does not precisely know the values at all neighbors, we assume a three-transmission version of our scheme throughout the rest of this paper: one transmission from to to initiate gossiping, one from to its neighbors to inform them of the new value, and one from to its neighbors to inform them of its new value. We comment further on this issue in Section IV-C.

### Ii-a Initialization

Calculating the greedy update in (1) requires nodes to know their neighbors’ values. Similar to other randomized gossip algorithms, we assume that at the outset of gossip computation each node has already discovered its neighbor set, , but it does not know its neighbors’ values. Instead, these values are learned during an initialization phase. During the initialization phase, the node that is activated at iteration chooses randomly from the subset of its neighbors whose values it does not know, rather than performing a GGE update. Since and broadcast their new values after averaging, the nodes in their neighborhoods overhear and acquire information accordingly. Once has heard from all of its neighbors, the initialization process is complete for that particular node and it chooses greedily via (1) for all subsequent iterations.

Figure 1 illustrates the effects of this initialization scheme relative to an idealized initialization scheme and a naïve initialization scheme. In the idealized scheme, each node clairvoyantly knows all of its neighbors’ initial values, and all nodes immediately commence with greedy updates. In the naïve scheme, before any node begins gossiping, all nodes broadcast once (without performing an update) to inform their neighbors of their starting value. The results shown correspond to a network of nodes444The topology is generated randomly from the family of random geometric graphs on nodes with the standard connectivity radius. See Section IV for further details on simulations, and note that the setup used here is the same as that used to generate Figure 2(a). Although not reported here due to space constraints, we observe similar behavior on other network topologies.. Observe that the proposed scheme and the naïve scheme attain similar performance, and both involve an overhead of roughly transmissions, which is not substantial relative to the total number of transmission required. We prefer the proposed initialization scheme to the naïve broadcast approach because it does not require any scheduling mechanism.

## Iii Convergence Analysis

### Iii-a Convergence of GGE

To derive convergence results, we interpret GGE as a randomized incremental subgradient method [22]. Consider the constrained optimization problem,

 minx∈Rn n∑i=1fi(x) subject to x∈X,

where we assume that each is a convex function, not necessarily differentiable, and is a non-empty convex subset of . An incremental subgradient algorithm for solving this optimization problem is an iterative algorithm of the form:

 x(k)=PX[x(k−1)−αkg(sk,x(k−1))], (3)

where is the step-size, is a subgradient555Subgradients generalize the notion of a gradient for non-smooth functions. The subgradient of a convex function at

is any vector

that satisfies . The set of subgradients of at is referred to as the subdifferential and is denoted by . If is continuous at , then ; i.e., the only subgradient of at is the gradient. A sufficient and necessary condition for to be a minimizer of the convex function is that . See [22] and references therein. of at , and projects its argument onto the set . The algorithm is randomized when the component updated at each iteration, , is drawn uniformly at random from the set , and is independent of . Intuitively, the algorithm resembles gradient descent, except that instead of taking a descent step in the direction of the gradient of the cost function, , at each iteration we focus on a single component, . The projection, , ensures that each new iterate is feasible. Under mild conditions on the sequence of step sizes, , and on the regularity of each component function , Nedić and Bertsekas have shown that the randomized incremental subgradient method described above converges to a neighborhood of the global minimizer [22].

GGE is a randomized incremental subgradient algorithm for the problem

 minx∈Rn n∑i=1maxj∈Ni{12(xi−xj)2} (4) subject to n∑i=1xi=n∑i=1yi, (5)

where is the initial value at node . The objective function in (4) has a minimum value of 0 which is attained when for all . Thus, any minimizer is a consensus solution. Moreover, the constraint ensures that the unique global minimizer is the average consensus.

To connect the GGE update, (2), and the incremental subgradient update, (3), observe that , the subgradient of , is defined by

 gi(k)=⎧⎪⎨⎪⎩xsk(k−1)−xtk(k−1) for i=sk,−(xsk(k−1)−xtk(k−1)) for i=tk,0 otherwise. (6)

Here the subscript indexes the components of the vector . Fixing a constant step size , the update (3) is identical to (2). The recursive update for GGE thus has the form

 x(k)=x(k−1)−12g(k), (7)

Note that the projection is unnecessary since the choice ensures that the constraint is satisfied at each iteration.

Nedić and Bertsekas study the convergence behavior of randomized incremental subgradient algorithms in [22]. For a constant step size, their analysis only guarantees the iterates will reach a neighborhood of the optimal solution: with probability 1, , where is an upper bound on the norm of the subgradient [22]. We wish to show that actually converges to the average consensus, , the global minimizer of our optimization problem, and not just to a neighborhood of . By exploiting the specific form of the GGE cost function, we are able to prove the following stronger result.

###### Theorem 1

Let denote the sequence of iterates produced by GGE. Then almost surely.

To begin, we examine the improvement in squared error after one GGE iteration. Expanding via the expression (7), we have

 ∥x(k)−¯x∥2 = ∥x(k−1)−12g(k)−¯x∥2 = ∥x(k−1)−¯x∥2−⟨x(k−1)−¯x,g(k)⟩+14∥g(k)∥2.

Based on the definition of in (6), given and , we have

 ∥g(k)∥2=2(xsk(k−1)−xtk(k−1))2,

and,

 ⟨x(k−1)−¯x,g(k)⟩ = n∑i=1(xi(k−1)−¯xi)gi(k) = (xsk(k−1)−xtk(k−1))2.

Therefore, we have

 ∥x(k)−¯x∥2=∥x(k−1)−¯x∥2−14∥g(k)∥2 (8)

with probability 1, since the expression holds independent of the value of and . Recursively applying our update expression, we find that w.p. 1,

 ∥x(k)−¯x∥2 = ∥x(k−1)−¯x∥2−14∥g(k)∥2 = ∥x(k−2)−¯x∥2−14k∑j=k−1∥g(j)∥2 ⋮ = ∥x(0)−¯x∥2−14k∑j=1∥g(j)∥2.

Since , we must have

 k∑j=1∥g(j)∥2≤4∥x(0)−¯x∥2

w.p. 1, and, consequently, the series converges a.s. as . Since each term , this also implies that a.s. as . However, by definition, is the subgradient of a convex function, and is both a sufficient and necessary condition for to be a global minimizer. Thus, a.s. implies that a.s., since is the unique minimizer of (4)-(5).

### Iii-B Convergence Rate: GGE vs. Randomized Gossip

The following theorem establishes a general expression for the bound on the mean-squared error of GGE after iterations. Moreover, it demonstrates that the upper bound on the MSE of GGE is less than or equal to the upper bound on the MSE of randomized gossip.

The GGE updates can also be expressed in the form where is a stochastic matrix with , for all , and 0 elsewhere. We denote the application of successive GGE updates by . Likewise, let denote the successive application of randomized gossip updates. Let denote the expected value of the randomized gossip matrix, and let denote the second largest eigenvalue of .

###### Theorem 2

Let the algorithm input, , be given, let denote the corresponding average consensus vector. After iterations, the expected mean squared error of GGE is upper bounded as follows:

 E[∥WGGE(1:k)x(0)−¯x∥2]≤∥x(0)−¯x∥2k∏i=1(λ2(¯¯¯¯¯¯W)−ξi) (9)

where if , and otherwise,

 ξi =E[n∑s=1(maxt∈Ns(xs(i−1)−xt(i−1))2)−n∑s=1(1|Ns|∑t∈Ns(xs(i−1)−xt(i−1))2)]2nE[∥WGGE(1:i−1)x(0)−¯x∥2]≥0. (10)

Note that is a random quantity determined by the choice of , and the expectation in (10) is over these variables.

###### Remark 1

The analogous expression for randomized gossip is simply [12]

 E[∥WRG(1:k)x(0)−¯x∥2]≤∥x(0)−¯x∥2λ2(¯¯¯¯¯¯W)k.

(Note that here, the expectation is taken with respect to both random nodes chosen at each iteration, and , whereas in the expressions in the theorem, the only randomness is in .) Since for all , this implies that the upper bound on GGE is uniformly upper bounded by the upper bound for randomized gossip, for any and any input . The upper bound for random gossip is tight; if

denotes the eigenvector corresponding to the second-largest eigenvalue of

, then if for some constant , the upper bound holds with equality (in expectation).

###### Remark 2

The form of the terms also provides insight into which scenarios are less favorable for GGE. In general, we know that randomized gossip is slow to converge on random geometric graphs [12], and so we hope that so that GGE achieves some improvement. Note that the numerator of measures how much larger (on average) a GGE step from is in comparison to the step taken by randomized gossip from the same location. There are two scenarios where the expression for in (10) evaluates to 0. The first is when , in which case consensus has already been achieved. The second is when the difference between any two neighbors is constant across the network; i.e., for all and all . In this setting, being greedy does not provide any gain, since gossiping with any neighbor provides the same amount of immediate improvement.

[Proof of Theorem 2] We recall the known convergence rate bounds for randomized gossip [12],

 E[||WRG(1:k)x(0)−¯x||2]≤λ2(¯¯¯¯¯¯W)k||x(0)−¯x||2 (11)

and the related recursive relationship,

 E[||WRG(1:k)x(0)−¯x||2] =E⎡⎣||WRG(1:k−1)x(0)−¯x||2−12nn∑s=11|Ns|∑t∈Ns(xs(k−1)−xt(k−1))2⎤⎦ ≤λ2(¯¯¯¯¯¯W)E[||WRG(1:k−1)x(0)−¯x||2]. (12)

We can identify an equivalent relationship derived from applying steps of GGE followed by one step of random gossip:

 E[||WRG(k)WGGE(1:k−1)x(0)−¯x||2] =E⎡⎣||WGGE(1:k−1)x(0)−¯x||2−12nn∑s=11|Ns|∑t∈Ns(xs(k−1)−xt(k−1))2⎤⎦ ≤λ2(¯¯¯¯¯¯W)E[||WGGE(1:k−1)x(0)−¯x||2]. (13)

With this relationship in hand, we can bound the error of the GGE algorithm by adding and subtracting the effects of making the -th step a randomized gossip update:

 E[||WGGE(1:k)x(0)−¯x)||2] =E⎡⎣||WGGE(1:k−1)x(0)−¯x)||2−12nn∑s=11|Ns|∑t∈Ns(xs(k−1)−xt(k−1))2⎤⎦ −E⎡⎣12nn∑s=1maxt∈Ns(xs(k−1)−xt(k−1))2+12nn∑s=11|Ns|∑t∈Ns(xs(k−1)−xt(k−1))2⎤⎦ ≤[λ2(¯¯¯¯¯¯W)−ξk]E[||WGGE(1:k−1)x(0)−¯x||2]. (14)

Repeated application of this inequality from yields the bound (9).

### Iii-C GGE Convergence Rate: Worst Case Bounds

The previous subsection related the performance of GGE to that of standard randomized gossip. Next, we seek a more direct characterization of the GGE rate of convergence in terms of properties of the underlying communication topology. We then revisit our comparison to randomized gossip. The rate of convergence for gossip algorithms is typically quantified in terms of the -averaging time,

 Tave(ϵ)=supx(0)≠0inf{k : Pr(∥x(k)−¯x∥∥x(0)−¯x∥≥ϵ)≤ϵ}.

Other gossip algorithms such as randomized gossip and geographic gossip are easily related to a homogeneous Markov chain, and can be shown to scale as a function of the second largest eigenvalue of . In particular (see Theorem 3 in [12]), . For randomized gossip, the matrix depends on the choice of probabilities assigned to each edge in the network and hence depends on the network topology.

Since, in each iteration of GGE, the greedy decision depends on the gossip values at each node, , our algorithm cannot be related to a homogeneous Markov chain ( depends on ). Consequently, the same machinery cannot be used to characterize the rate of convergence for GGE. The goal of this section is to bound the rate of convergence of GGE through alternative means. To this end, our main result is the following.

###### Theorem 3

Let denote the graph on which we are gossiping, let denote the vector of GGE values after iterations, and let denote the average vector. Then

 E[∥x(k)−¯x∥2] ≤ A(G)k∥x(0)−¯x∥2,

where is the graph-dependent constant defined as

 A(G) = maxx≠¯x1nn∑s=1(1−∥gs(x)∥24∥x−¯x∥2),

where refers to a subgradient of , when viewing GGE as an incremental subgradient algorithm666We explicitly note that this constant is a function of the underlying topology by writing . The constant is completely determined by the neighborhood structure of the network because the maximization is over all . For a fixed , the subgradients are determined by the neighborhood structure.. Moreover, the -averaging time for GGE is bounded above by

 Tave(ϵ) ≤ 3logϵ−1logA(G)−1.
###### Remark 3

Note that the constant only depends on the topology of the graph. This constant plays a role for GGE similar to that played by the second-largest eigenvalue of for regular gossip algorithms.

[Proof of Theorem 3] The proof of the first part of Theorem 3 is based on an approach introduced in [23] and developed in [24] for analyzing data-adaptive algorithms. We begin by recalling the recursion for the mean squared error of GGE after iterations expressed in (8):

 ∥x(k)−¯x∥2 = ∥x(k−1)−¯x∥2−14∥g(k)∥2 = (1−∥g(k)∥24∥x(k−1)−¯x∥2)∥x(k−1)−¯x∥2,

where denotes the subgradient at iteration (when viewing GGE as a randomized incremental subgradient algorithm), and is a random quantity, depending on which node is activated at iteration . Let denote the error after iterations, and let denote the amount of contraction at iteration . Using these definitions and some successive conditioning, we get

 E[M(k)] = E[N(k)M(k−1)] = E[E[N(k)M(k−1)|x(k−1)]] = E[M(k−1)E[N(k)|x(k−1)]] ⋮ = M(0)E[E[N(1)|x(0)]⋯E[N(k)|x(k−1)]].

Note that is defined in such a way that for all . Therefore, it follows that

 E[∥x(k)−¯x∥2] ≤ A(G)k∥x(0)−¯x∥2.

Next, we prove the second part of the claim: the bound on -averaging time. To do this, we will use the bound we have just derived to develop an upper bound on , the probability that after iterations we are still more than a factor of away from the initial error. Applying Markov’s inequality and the bound we just derived for , we have

 Pr(∥x(k)−¯x∥≥ϵ∥x(0)−¯x∥) = Pr(∥x(k)−¯x∥2≥ϵ2∥x(0)−¯x∥2) ≤ E[∥x(k)−¯x∥2]ϵ2∥x(0)−¯x∥2 ≤ ϵ−2A(G)k.

To get an upper bound on , first note that provided that . Since in the first part of our proposition, the bound on is based on a worst-case one-step analysis, it is an upper bound on the mean squared error at iteration k, effectively a lower bound on the rate of convergence. Therefore, we have an upper bound on the -averaging time for GGE; that is .

Theorem 3 provides a direct link between the rate of convergence of GGE and the underlying network topology through the constant, . This motivates further study of how behaves for different classes and sizes of network topologies. Next, we derive a lower-bound on as a function of the maximum degree of the network, . Then we apply this result to characterize for two-dimensional grid and random geometric graph topologies.

###### Theorem 4

Let be a graph with nodes and maximum degree . As above, let denote the expected update for one step of randomized gossip on . There exists a vector with corresponding average consensus vector such that

 E∥WGGEx−¯x∥2∥x−¯x∥2≥(1−dmax(1−λ2(¯¯¯¯¯¯W))), (15)

and this implies a lower-bound for ,

 A(G)≥1−dmax(1−λ2(¯¯¯¯¯¯W)). (16)

The proof appears below. We can use this result to relate the upper bounds on averaging time for GGE and standard randomized gossip. Let denote the upper bound on the averaging time of GGE obtained in Theorem 3, and let denote the corresponding upper bound on the averaging time of randomized gossip [12]. Using two inequalities for the logarithm — namely, for , and for — we obtain

 UGGE(G,ϵ)≥3logϵ−1dmax(1−λ2(¯¯¯¯¯¯W))≥URG(G,ϵ)/dmax. (17)

In words, the upper bound on the averaging time of GGE is at most a factor of better than the upper bound for randomized gossip. Of course, this only links the upper bounds of the two algorithms and does not directly relate their actual performance. However, simulation results presented in the next section indicate that this relationship indeed captures the improvements seen for GGE over randomized gossip. Moreover, recall that the bounds on the expected improvement after a single gossip iteration are tight for both GGE and randomized gossip. The bound for GGE in Theorem 3 becomes an equality for when is taken to be the that solves the optimization problem defining . Similarly, the bound for randomized gossip becomes an equality when is taken to be the eigenvector corresponding to the second largest eigenvalue of .

We are interested in understanding the performance of GGE for applications primarily in wireless networks. Random geometric graphs, first introduced in [13], are commonly used to model connectivity in wireless networks for the purpose of analyzing scaling behaviour of algorithms. A random geometric graph on nodes is obtained by uniformly assigning each node i.i.d. coordinates in the unit square and then connecting nodes whose distance is less than connectivity radius . In this paper we adopt the common scaling , which guarantees the network is connected with high probability [13].

For a random geometric graph with nodes, it is known [12] that, for as given above, every node has neighbors with high probability. Thus, with , we see that GGE gives essentially a factor of improvement in averaging time over randomized gossip. For a two-dimensional grid , and GGE gives only a constant improvement in averaging time. These results are illustrated via simulation in the next section.

[Proof of Theorem 4] Our starting point is (10) from Theorem 2. Since we are focusing on the effect of applying a single gossip iteration, we drop the time index in (10) to simplify the notation:

 ξ (18)

Using the fact that the maximum of a set of non-negative values is always less than or equal to the sum of those values, we can write

 E[maxt∈Ns(xs−xt)2] ≤ |Ns||Ns| E⎡⎣∑t∈Ns(xs−xt)2⎤⎦ (19) ≤ dmax|Ns| E⎡⎣∑t∈Ns(xs−xt)2⎤⎦. (20)

Therefore we can upper bound by

 ξ≤(dmax−1) E[n∑s=1(1|Ns|∑t∈Ns(xs−xt)2)]2nE[∥x−¯x∥2]. (21)

Next, take to be the eigenvector corresponding to the second largest eigenvalue of and equate in (12) to get

 E⎡⎣−12nn∑s=11|Ns|∑t∈Ns(xs−xt)2⎤⎦ ≤ −(1−λ2(¯¯¯¯¯¯W))E[∥x−¯x∥2]. (22)

Applying this inequality in (21) gives

 ξ≤(dmax−1)(1−λ2(¯¯¯¯¯¯W)). (23)

Observe that the one-step bound obtained by taking in (9) is tight. In particular, for our choice of as the eigenvector corresponding to the second largest eigenvalue of , we have equality in (9):

 E[∥WGGEx−¯x∥2]=∥x−¯x∥2(λ2(¯¯¯¯¯¯W)−ξ). (24)

Inserting (23) into (24) leads to the first claim in the proposition. Then, combining this bound with the first inequality in Theorem 3 for the case and yields the desired lower bound on .

## Iv Numerical Simulations

In this section we report the results of simulations conducted to compare the performance of GGE with randomized gossip [12] and geographic gossip [10] for a variety of initial conditions. We also compare the empirically achieved convergence rates to the bound established in Section III-C and investigate how this bound behaves as the number of nodes in the network grows.

### Iv-a Comparison of Convergence Rates

We first compare the convergence rates of GGE with randomized gossip and geographic gossip by examining the reduction they achieve in relative error, , as a function of the number of transmissions (communication complexity). Since the number of transmissions per iteration is different for each algorithm, this is a fairer comparison than examining convergence rate relative to the number of iterations. Randomized gossip requires two wireless transmissions per iteration, GGE requires three transmissions (see the discussion in Section II), and geographic gossip has a variable number of transmissions per iteration, depending on the number of hops between the gossiping nodes. We simulate networks with random geometric graph topologies, and all figures show averages over 100 realizations of the random geometric graph. We examine performance for four different initial conditions, , in order to explore the impact of the initial values on performance. The first two of these cases are a Gaussian bumps field, and a linearly-varying field. For these two cases, the initial value

is determined by sampling these fields at the locations of the nodes. The remaining two initializations consist of the “spike” signal, constructed by setting the value of one random node to 1 and all other node values to 0, and a random initialization where each value is i.i.d. drawn from a Gaussian distribution

of zero mean and unit variance. The first three of these signals were also used to examine the performance of geographic gossip in [10].

Figs. 2(a)-(d) show that GGE converges to the average at a much faster rate (both initially and asymptotically) than randomized gossip for all initial conditions. The initial rate of convergence of GGE is faster than geographic gossip for all but the linearly-varying field, and for the simulated network size (), the asymptotic rates of reduction in relative error are similar for the two algorithms. Out of these candidate initializations, the linearly-varying field is the worst case. This is not surprising since the convergence analysis conducted in Section III suggests that constant differences between neighbors causes GGE to provide minimal gain.

We also compare the performances of the three gossip algorithms for the grid topology. Figure 3 shows that in grid-structured networks the performance of GGE is close to the performance of randomized gossip (constant improvement). Clearly geographic gossip has the best performance in this case. As discussed in Section III-C, the small number of neighbors in the grid topology restricts the improvement that GGE can achieve relative to randomized gossip.

For random geometric graph topologies, the expected node degree, , scales as in contrast to constant average node degree in the case of grid topology. Therefore, GGE is able to provide better performance results compared to randomized gossip for graph topologies where average node degree increases with the number of nodes. To improve GGE performance for topologies with low expected node degree values, we propose an extension to the algorithm. Details of this extension, which we call Multi-hop GGE, are provided in Section V.

### Iv-B Comparison with the Theoretical Upper Bound

We now compare the empirical average relative error for the random geometric graph with the bound developed in Theorem 3. There is no closed-form solution for , so we solve the optimization problem identified in Theorem 3 numerically, using an incremental subgradient algorithm. Since the cost function can be expressed as a function of , we can focus on maximizing over satisfying and . In this simplified setting, one can reformulate the optimization as the minimization of a convex function over a non-convex set of constraints. We approximate the solution to this minimization using a projected incremental subgradient method. To avoid the problem of local minima (since the constraint set is non-convex) we rerun the optimization algorithm from multiple initial conditions. Figure 4 shows the relative error achieved by GGE as a function of the number of iterations for different initial conditions of , averaged over 100 realizations of the algorithm. Also plotted is the bound identified in Theorem 3, substituting in calculated numerically. For all but the linearly-varying field, GGE achieves a much more rapid initial decrease in error than indicated by the bound. After approximately 1000 iterations, the bound provides a good indication of the rate of decrease in error. We again observe that the linearly-varying field is close to a worst-case scenario for GGE.

Next, we examine how the communication complexity scales with respect to the number of nodes in the network. Figs. 5(a) and 5(b) display how and the averaging time scale as a function of the number of nodes , for random geometric graph and grid topologies, respectively. To obtain the random geometric graph curve, we generate 50 random graphs for each value of , and numerically evaluate for each using the procedure detailed above. The top panel shows how the values of change as the number of nodes increases. The bottom panel shows the -averaging time, , evaluated via simulation, for versus the number of nodes. Note that Figs. 5(a) and 5(b) show the averaging time in terms of the number of iterations per node. The errorbars depict the minimum, mean and maximum values obtained for the 50 simulated graphs for each . For reference, the dotted lines depict for the random geometric graph and for the grid topology.

### Iv-C Stale Information

In wireless networks, links can be unreliable, and for GGE, it is possible for nodes to miss some updates from their neighbors. Consequently, nodes will have stale information about their neighbors’ values and therefore the greedy selection in GGE may be affected. Here we investigate the effect of stale information on the performance of GGE through a simulation study.

We consider random graph topologies with 200 nodes. The initial measurements correspond to sampling the Gaussian bumps field, similar to Figure 2(a). As described in Section II, at the th GGE iteration, two nodes and perform averaging. To provide up-to-date information to their neighbors, and broadcast their new values. We simulate the case when nodes randomly miss the broadcasted messages. We assume that the gossiping nodes and communicate reliably, but sometimes eavesdropping nodes miss an update from their neighbor. Specifically, each eavesdropping node independently misses the transmission from its neighbor with probability . Figure 6 illustrates the performance degradation in GGE. Curves are shown for four different values of between 0 and 0.5, and standard randomized gossip is also shown for comparison. We conclude that GGE provides significantly better performance than randomized gossip even when 50 percent of the broadcast messages are missed.

## V Multi-hop Greedy gossip with eavesdropping

As Figs. 2 and 3 indicate, the improvement of GGE over randomized gossip is less for the grid topology compared to the RGG topology. The decrease in improvement is due to the fact that the node degree in a two-dimensional grid is bounded at 4 and does not increase with the network size. Here we propose an extension to our algorithm that improves the performance of GGE such that it can be employed for topologies with low average node degree. Essentially, this extension allows nodes to perform greedy gossip updates with nodes beyond their immediate one-hop neighborhood.

In one-hop GGE, at the th iteration, determines which neighbor, , has a value most different from its own. In two-hop GGE, instead of completing the update, checks if any of its neighbors has a value even more different from than its own; i.e., . If so, nodes and gossip; otherwise and gossip. Multi-hop gossip generalizes this idea to even larger neighborhoods. For example, can search its neighborhood, and so on.

To observe the effect of performing greedy updates over multiple hops, we conduct an experimental comparison between 1-hop, 2-hop, and 3-hop GGE. As a point of comparison, we also include curves for randomized gossip and geographic gossip. Figure 7 illustrates the results for grid and random geometric graph topologies. In the grid, 3-hop GGE achieves an asymptotic rate of reduction in relative error that is comparable to geographic gossip. In the random geometric graph topology, the asymptotic performance of 1-hop GGE is already similar to that of geographic gossip (for a network of 200 nodes) and the multi-hop versions lead to significant improvements while still limiting all gossip exchanges to be between nodes separated by at most three hops.

## Vi Conclusion

In this paper we propose a new average consensus algorithm for wireless sensor networks. Greedy gossip with eavesdropping (GGE) makes use of the broadcast nature of wireless communications and provides fast and reliable computation of average consensus. We provide (i) a proof that GGE converges to the average consensus; (ii) a bound on the mean-squared error after iterations of GGE; (iii) a bound on the -averaging time of GGE; and (iv) theoretical bounds suggesting that GGE converges faster than randomized gossip, and (v) a characterization of the improvement in convergence rate achieved by GGE over randomized gossip as a function of the maximum degree. Simulation experiments compare the performance of GGE, randomized gossip [12], and geographic gossip [10] and demonstrate that the theoretical bound on mean-squared error provides a good characterization of the algorithm performance. The simulation experiments also investigate the scaling behavior of the communication complexity of GGE.

GGE retains the robustness and simplicity of randomized gossip; it does not require nodes to acquire location information and it does not introduce the overhead of geographic routing. There is an additional memory overhead (nodes store their neighbors’ values), but this storage requirement is small. Nodes do need to learn their neighbors’ values, and we propose an initialization process that introduces a minor performance penalty with negligible added complexity. Since nodes eavesdrop on their neighbors’ broadcasts, they must remain in “Receive” mode throughout the entire operation of the GGE algorithm. In randomized gossip, nodes can enter “Idle” mode and only switch to “Receive” mode when they detect that a neighbor is requesting a data exchange. In a wireless sensor network implementation, this difference could lead to concerns that GGE would consume more energy than randomized gossip. However, empirical studies have shown that energy consumption in “Idle” and “Receive” modes is very similar for most existing wireless sensor network architectures [25].

Our future work will investigate the benefits of GGE in networks of mobile nodes. When nodes are mobile, other fast consensus approaches which exploit knowledge of geographic location are no longer applicable. However, because GGE is purely local and adaptive, we believe it is a promising candidate for accelerating gossip algorithms in time-varying networks. We also plan to investigate further connections between consensus algorithms and incremental subgradient optimization algorithms, towards computing more general functions than the average.

## References

• [1] D. Üstebay, M. Coates, and M. Rabbat, “Greedy gossip with eavesdropping,” in Proc. IEEE Int. Symp. on Wireless Pervasive Computing, Santorini, Greece, May 2008.
• [2] D. Üstebay, B. Oreshkin, M. Coates, and M. Rabbat, “Rates of convergence for greedy gossip with eavesdropping,” in Proc. Allerton Conf. on Comm., Control, and Computing, IL, USA, September 2008.
• [3] ——, “The speed of greed: Characterizing myopic gossip through network voracity,” to appear in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), Taipei, Taiwan, April 2009.
• [4] J. Tsitsiklis, “Problems in decentralized decision making and computation,” Ph.D. dissertation, Massachusetts Institute of Technology, 1984.
• [5]

S. Sundhar Ram, V. Veeravalli, and A. Nedić, “Distributed and recursive parameter estimation in parametrized linear state-space models,” Submitted, Apr. 2008.

• [6] M. Rabbat, R. Nowak, and J. Bucklew, “Robust decentralized source localization via averaging,” in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), Phil., PA, Mar. 2005.
• [7] M. Rabbat, J. Haupt, A. Singh, and R. Nowak, “Decentralized compression and predistribution via randomized gossiping,” in Proc. Information Processing in Sensor Networks, Nashville, TN, Apr. 2006.
• [8] A. Kashyap, T. Basar, and R.Srikant, “Quantized consensus,” Automatica, vol. 43, pp. 1192–1203, Jul. 2007.
• [9] S. Sundaram and C. Hadjicostis, “Distributed function calculation and consensus using linear iterative strategies,” IEEE J. Selected Areas in Communications, vol. 26, no. 4, pp. 650–660, May 2008.
• [10] A. Dimakis, A. Sarwate, and M. Wainwright, “Geographic gossip: Efficient aggregation for sensor networks,” in Proc. Int. Conf. Inf. Proc. in Sensor Networks (IPSN), Nashville, TN, Apr. 2006.
• [11] L. Xiao and S. Boyd, “Fast linear iterations for distributed averaging,” Systems and Control Letters, vol. 53, no. 1, pp. 65–78, Sep. 2004.
• [12] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, “Randomized gossip algorithms,” IEEE Trans. Info. Theory, vol. 52, no. 6, pp. 2508–2530, June 2006.
• [13] P. Gupta and P. Kumar, “The capacity of wireless networks,” IEEE Trans. Info. Theory, vol. 46, no. 2, pp. 388–404, March 2000.
• [14] W. Li and H. Dai, “Location-aided distributed averaging algorithms: Performance lower bounds and cluster-based variant,” in Proc. Allerton Conf. on Comm., Control, and Computing, Urbana-Champaign, IL, Sep. 2007.
• [15] K. Jung, D. Shah, and J. Shin, “Fast gossip through lifted Markov chains,” in Proc. Allerton Conf. on Comm., Control, and Computing, Urbana-Champaign, IL, Sep. 2007.
• [16] F. Benezit, A. Dimakis, P. Thiran, and M. Vetterli, “Gossip along the way: Order-optimal consensus through randomized path averaging,” in Proc. Allerton Conf. on Comm., Control, and Computing, Urbana-Champaign, IL, Sep. 2007.
• [17] T. Aysal, M. Yildiz, and A. Scaglione, “Broadcast gossip algorithms,” in Proc. IEEE Information Theory Workshop, Porto, Portugal, May 2008.
• [18] T. Aysal, M. Yildiz, A. Sarwate, and A. Scaglione, “Broadcast gossip algorithms: Design and analysis for consensus,” in Proc. IEEE Conf. on Decision and Control, Cancun, Mexico, Dec. 2008.
• [19] S. Sundhar Ram, A. Nedić, and V. Veeravalli, “Incremental stochastic subgradient algorithms for convex optimization,” to appear in SIAM J. on Optimization.
• [20] A. Nedić and A. Ozdaglar, “Distributed subgradient methods for multi-agent optimization,” IEEE Trans. Automatic Control, vol. 54, no. 1, pp. 48–61, Jan. 2009.
• [21] D. Bertsekas and J. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods.   Belmont, MA: Athena Scientific, 1997.
• [22] A. Nedić and D. Bertsekas, “Incremental subgradient methods for nondifferentiable optimization,” SIAM J. on Optimization, vol. 12, no. 1, pp. 109–138, 2001.
• [23] M. Burnashev and K. Zigangirov, “An interval estimation problem for controlled observations,” Problems in Information Transmission, vol. 10, pp. 223–231, 1974.
• [24]

R. Castro and R. Nowak, “Active learning and sampling,” in

Foundations and Applications of Sensor Management, A. Hero, D. Castanon, D. Cochran, and K. Kastella, Eds.   Springer-Verlag, 2007, pp. 177–200.
• [25] S. P. V. Raghunathan, C. Schurgers and M. B. Srivastava, “Energy-aware wireless microsensor networks,” IEEE Signal Processing Magazine, vol. 19, no. 2, pp. 40–50, March 2002.