# Accelerated Gossip via Stochastic Heavy Ball Method

In this paper we show how the stochastic heavy ball method (SHB) -- a popular method for solving stochastic convex and non-convex optimization problems --operates as a randomized gossip algorithm. In particular, we focus on two special cases of SHB: the Randomized Kaczmarz method with momentum and its block variant. Building upon a recent framework for the design and analysis of randomized gossip algorithms, [Loizou Richtarik, 2016] we interpret the distributed nature of the proposed methods. We present novel protocols for solving the average consensus problem where in each step all nodes of the network update their values but only a subset of them exchange their private values. Numerical experiments on popular wireless sensor networks showing the benefits of our protocols are also presented.

## Authors

• 16 publications
• 94 publications
• ### Provably Accelerated Randomized Gossip Algorithms

In this work we present novel provably accelerated gossip algorithms for...
10/31/2018 ∙ by Nicolas Loizou, et al. ∙ 0

• ### Randomized Progressive Hedging methods for Multi-stage Stochastic Programming

Progressive Hedging is a popular decomposition algorithm for solving mul...
09/25/2020 ∙ by Gilles Bareilles, et al. ∙ 0

• ### Revisiting Randomized Gossip Algorithms: General Framework, Convergence Rates and Novel Block and Accelerated Protocols

In this work we present a new framework for the analysis and design of r...
05/20/2019 ∙ by Nicolas Loizou, et al. ∙ 0

• ### Quickly Finding a Benign Region via Heavy Ball Momentum in Non-Convex Optimization

The Heavy Ball Method, proposed by Polyak over five decades ago, is a fi...
10/04/2020 ∙ by Jun-Kun Wang, et al. ∙ 0

• ### A Privacy Preserving Randomized Gossip Algorithm via Controlled Noise Insertion

In this work we present a randomized gossip algorithm for solving the av...
01/27/2019 ∙ by Filip Hanzely, et al. ∙ 0

• ### Momentum Accelerated Multigrid Methods

In this paper, we propose two momentum accelerated MG cycles. The main i...
06/30/2020 ∙ by Chunyan Niu, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Average consensus is a fundamental problem in distributed computing and multi-agent systems. It comes up in many real world applications such as coordination of autonomous agents, estimation, rumour spreading in social networks, PageRank and distributed data fusion on ad-hoc networks and decentralized optimization. Due to its great importance there is much classical

[35, 7] and recent [38, 37, 4] work on the design of efficient algorithms/protocols for solving it.

One of the most attractive classes of protocols for solving the average consensus are gossip algorithms. The development and design of gossip algorithms was studied extensively in the last decade. The seminal 2006 paper of Boyd et al. [4] on randomized gossip algorithms motivated a fury of subsequent research and now gossip algorithms appear in many applications, including distributed data fusion in sensor networks [38], load balancing [6] and clock synchronization [11]. For a survey of selected relevant work prior to 2010, we refer the reader to the work of Dimakis et al. [8]. For more recent results on randomized gossip algorithms we suggest [40, 17, 28, 20, 24, 1]. See also [9, 2, 29, 14].

The main goal in the design of gossip protocols is for the computation and communication to be done as quickly and efficiently as possible. In this work, our focus is precisely this. We design randomized gossip protocols which converge to consensus fast.

### I-a The average consensus problem

In the average consensus (AC) problem we are given an undirected connected network with node set and edges . Each node “knows” a private value . The goal of AC is for every node to compute the average of these private values, , in a distributed fashion. That is, the exchange of information can only occur between connected nodes (neighbors).

### I-B Main Contributions

We present a new class of randomized gossip protocols where in each iteration all nodes of the network update their values but only a subset of them exchange their private information. Our protocols are based on recently proposed ideas for the acceleration of randomized Kaczmarz methods for solving consistent linear systems [22]

where the addition of a momentum term was shown to provide practical speedups over the vanilla Kaczmarz methods. Further, we explain the connection between gossip algorithms for solving the average consensus problem, Kaczmarz-type methods for solving consistent linear systems, and stochastic gradient descent and stochastic heavy ball methods for solving stochastic optimization problems. We show that essentially all these algorithms behave as gossip algorithms. Finally, we explain in detail the gossip nature of two recently proposed fast Kacmzarz-type methods: the randomized Kacmzarz with momentum (mRK), and its block variant, the randomized block Kaczmarz with momentum (mRBK). We present a detailed comparison of our proposed gossip protocols with existing popular randomized gossip protocols and through numerical experiments we show the benefits of our methods.

### I-C Structure of the paper

This work is organized as follows. Section II introduces the important technical preliminaries and the necessary background for understanding of our methods. A new connection between gossip algorithms, Kaczmarz methods for solving linear systems and stochastic gradient descent (SGD) for solving stochastic optimization problems is also described. In Section III the two new accelerated gossip protocols are presented. Details of their behaviour and performance are also explained. Numerical evaluation of the new gossip protocols is presented in Section IV. Finally, concluding remarks are given in Section V.

### I-D Notation

The following notational conventions are used in this paper. We write . Boldface upper-case letters denote matrices;

is the identity matrix. By

we denote the solution set of the linear system , where and . Throughout the paper, is the projection of onto (that is, is the solution of the best approximation problem; see equation (5)). An explicit formula for the projection of onto set is given by

 ΠL(x):=argminx′∈L∥x′−x∥=x−A⊤(AA⊤)†(Ax−b).

A matrix that often appears in our update rules is

 H:=S(S⊤AA⊤S)†S⊤, (1)

where

is a random matrix drawn in each step of the proposed methods from a given distribution

, and denotes the Moore-Penrose pseudoinverse. Note that is a random symmetric positive semi-definite matrix.

In the convergence analysis we use

to indicate the smallest nonzero eigenvalue, and

for the largest eigenvalue of matrix , where the expectation is taken over . Finally,

represents the vector with the private values of the

nodes of the network at the iteration while with we denote the value of node at the iteration.

## Ii Background-Technical Preliminaries

Our work is closely related to two recent papers. In [20], a new perspective on randomized gossip algorithms is presented. In particular, a new approach for the design and analysis of randomized gossip algorithms is proposed and it was shown how the Randomized Kaczmarz and Randomized Block Kaczmarz, popular methods for solving linear systems, work as gossip algorithms when applied to a special system encoding the underlying network. In [22], several classes of stochastic optimization algorithms enriched with heavy ball momentum were analyzed. Among the methods studied are: stochastic gradient descent, stochastic Newton, stochastic proximal point and stochastic dual subspace ascent.

In the rest of this section we present the main results of the above papers, highlighting several connections. These results will be later used for the development of the new randomized gossip protocols.

### Ii-a Kaczmarz Methods and Gossip Algorithms

Kaczmarz-type methods are very popular for solving linear systems with many equations. The (deterministic) Kaczmarz method for solving consistent linear systems was originally introduced by Kaczmarz in 1937 [15]. Despite the fact that a large volume of papers was written on the topic, the first provably linearly convergent variant of the Kaczmarz method—the randomized Kaczmarz Method (RK)—was developed more than 70 years later, by Strohmer and Vershynin [32]. This result sparked renewed interest in design of randomized methods for solving linear systems [25, 26, 10, 23, 39, 27, 31, 18]. More recently, Gower and Richtárik [12] provide a unified analysis for several randomized iterative methods for solving linear systems using a sketch-and-project framework. We adopt this framework in this paper.

In particular, the sketch-and-project algorithm [12] for solving the consistent linear system has the form

 xk+1 = xk−A⊤Sk(S⊤kAA⊤Sk)†S⊤k(Axk−b) (2) xk−A⊤Hk(Axk−b),

where in each iteration matrix is sampled afresh from an arbitrary distribution . In [12] it was shown that many popular algorithms for solving linear systems, including RK method and randomized coordinate descent method can be cast as special cases of the above update by choosing111In order to recover a randomized coordinate descent method, one also needs to perform projections with respect to a more general Euclidean norm. However, for simplicity, in this work we only consider the standard Euclidean norm. an appropriate distribution . The special cases that we are interested in are the randomized Kaczmarz (RK) and its block variant, the randomized block Kaczmarz (RBK).

Let be the unit coordinate vector in and let be column submatrix of the identity matrix with columns indexed by . Then RK and RBK methods can be obtained as special cases of the update rule (2) as follows:

• RK: Let , where

is chosen in each iteration independently, with probability

. In this setup the update rule (2) simplifies to

 xk+1=xk−Ai:xk−bi∥Ai:∥22A⊤i:. (3)
• RBK: Let , where is chosen in each iteration independently, with probability . In this setup the update rule (2) simplifies to

 xk+1=xk−A⊤C:(AC:A⊤C:)†(AC:xk−bC). (4)

In this paper we are interested in two particular extension of the above methods: the randomized Kaczmarz method with momentum (mRK) and its block variant, the randomized block Kaczmarz with momentum (mRBK), both proposed and analyzed in [22]. Before we describe these two algorithms, let us summarize the main connections between the Kaczmarz methods for solving linear systems and gossip algorithms, as presented in [20].

In [13, 30, 22], it was shown that even in the case of consistent linear systems with multiple solutions, Kaczmarz-type methods converge linearly to one particular solution: the projection of the initial iterate onto the solution set of the linear system. This naturally leads to the formulation of the best approximation problem:

 minx=(x1,…,xn)∈Rn12∥x−x0∥2subject toAx=b. (5)

Above, and is the standard Euclidean norm. By we denote the solution of (5).

In [20] it was shown how RK and RBK work as gossip algorithms when applied to a special linear system encoding the underlying network.

###### Definition II.1 ([20])

A linear system is called “average consensus (AC) system” when is equivalent to saying that for all .

Note that many linear systems satisfy the above definition. For example, we can choose and to be the incidence matrix of . In this case, the row of the system corresponding to edge directly encodes the constraint . A different choice is to pick and , where is the Laplacian of . Note that depending on what AC system is used, RK and RBK have different interpretations as gossip protocols.

From now on we work with the AC system described in the first example. Since , the general sketch-and-project update rule (2) simplifies to:

 xk+1=[I−A⊤HkA]xk. (6)

The convergence performance of RK and RBK for solving the best approximation problem (and as a result the average consensus problem) is described by the following theorem.

###### Theorem II.2 ([12, 13])

Let be the iterates produced by (2). Then where is the solution of (5), , and denotes the minimum nonzero eigenvalue of .

In [20], the behavior of both RK and RBK as gossip algorithms was described, and a comparison with the convergence results of existing randomized gossip protocols was made. In particular, it was shown that the most basic randomized gossip algorithm [4] (“randomly pick an edge and then replace the values stored at vertices and by their average”) is an instance of RK applied to the linear system , where the is the incidence matrix of . RBK can also be interpreted as a gossip algorithm:

###### Theorem II.3 ([20], RBK as a Gossip Algorithm)

Each iteration of RBK for solving works as follows: 1) Select a random set of edges , 2) Form subgraph of from the selected edges, 3) For each connected component of , replace node values with their average.

### Ii-B The Heavy Ball momentum

A detailed study of several (equivalent) stochastic reformulations of consistent linear systems was developed in [30]. This new viewpoint facilitated the development and analysis of relaxed variants (with relaxation parameter ) of the sketch-and-project update (2). In particular, one of the reformulations is the stochastic optimization problem

 minx∈Rnf(x):=ES∼D[fS(x)],where (7)
 fS(x):=12∥Ax−b∥2H=12(Ax−b)⊤H(Ax−b), (8)

and is the random symmetric positive semi-definite matrix defined in (1).

Under certain (weak) condition on , the set of minimizers of is identical to the set of the solutions of the linear system. In [30], problem (7) was solved via Stochastic Gradient Descent (SGD):

 xk+1=xk−ω∇fSk(xk), (9)

and a linear rate of convergence was proved despite the fact that is not necessarily strongly convex and that a fixed stepsize is used. Observe that the gradient of the stochastic function (8) is given by

 ∇fSk(x)A⊤Hk(Ax−b). (10)

and as a result, it is easy to see that for , the SGD update (9) reduces to the sketch-and-project update (2).

The recent works [21, 22] analyze momentum variants of SGD, with the goal to accelerate the convergence of the method for solving problem (7

). SGD with momentum—also known as the stochastic heavy ball method (SHB)—is a well known algorithm in the optimization literature for solving stochastic optimization problems, and it is extremely popular in areas such as deep learning

[33, 34, 16, 36]. However, even though SHB is used extensively in practice, its theoretical convergence behavior is not well understood. To the best of our knowledge, [21, 22] are the first that prove linear convergence of SHB in any setting.

The update rule of SHB for solving problem (7) is formally presented in the following algorithm:

Using the expression for the stochastic gradient (10), the update rule of SHB can be written more explicitly:

 xk+1=xk−ωA⊤Hk(Axk−b)+β(xk−xk−1). (11)

Using the same choice of distribution as in equation (3) and (4), we now obtain momentum variants of RK and RBK:

• RK with momentum (mRK):

 xk+1=xk−ωAi:xk−bi∥Ai:∥22A⊤i:+β(xk−xk−1)
• RBK with momentum (mRBK):

 xk+1=xk−ωA⊤C:(AC:A⊤C:)†(AC:xk−bC)+β(xk−xk−1)

In [22], two main theoretical results describing the behavior of SHB (and as a result also the special cases mRK and mRBK) were presented:

###### Theorem II.4 (Theorem 1, [22])

Choose . Let be the sequence of random iterates produced by SHB. Let (resp, ) be the smallest nonzero (resp. largest) eigenvalue of . Assume and and that the expressions and satisfy . Then

 E[∥xk−x∗∥2]≤qk(1+δ)∥x0−x∗∥2, (12)

and where and . Moreover, .

###### Theorem II.5 (Theorem 4, [22])

Let be the sequence of random iterates produced by SHB, started with , with relaxation parameter (stepsize) and momentum parameter . Let . Then there exists a constant such that for all we have

Using Theorem II.5 and by a proper combination of the stepsize and the momentum parameter , SHB enjoys an accelerated linear convergence rate in mean, [22].

###### Corollary 1

(i) If and , then the iteration complexity of SHB becomes: .
(ii) If and , then the iteration complexity of SHB becomes: .

## Iii Randomized Gossip protocols with momentum

Having presented SHB for solving the stochastic optimization problem (7) and describing its sketch-and-project nature (11), let us now describe its behavior as a randomized gossip protocol when applied to solving the AC system , where is the incidence matrix of the network.

Since , method (11) can be simplified to:

 (13)

In the rest of this section we focus on two special cases of (13): RK with momentum and RBK with momentum.

### Iii-a Randomized Kaczmarz Gossip with momentum

When RK is applied to solve an AC system , one recovers the famous pairwise gossip algorithm [4]. Algorithm 2 describes how the relaxed variant of randomized Kaczmarz with momentum behaves as a gossip algorithm. See also Figure (1) for a graphical illustration of the method.

###### Remark III.1

In the special case with (zero momentum) only the two nodes of edge update their values. In this case the two nodes do not update their values to their exact average but to a convex combination that depends on the stepsize . To obtain the pairwise gossip algorithm of [4], we should further choose .

Distributed Nature of the Algorithm: Here we highlight a few ways to implement mRK in a distributed fashion: Asynchronous pairwise broadcast gossip: In this protocol each node of the network

has a clock that ticks at the times of a rate 1 Poisson process. The inter-tick times are exponentially distributed, independent across nodes, and independent across time. This is equivalent to a global clock ticking at a rate

Poisson process which wakes up an edge of the network at random. In particular, in this implementation mRK works as follows: In the iteration (time slot) the clock of node ticks and node randomly contact one of its neighbors and simultaneously broadcast a signal to inform the nodes of the whole network that is updating (this signal does not contain any private information of node ). The two nodes share their information and update their private values following the update rule of Algorithm 2 while all the other nodes updating their values using their own information. In each iteration only one pair of nodes exchange their private values.

Synchronous pairwise gossip: In this protocol a single global clock is available to all nodes. The time is assumed to be slotted commonly across nodes and in each time slot only a pair of nodes of the network is randomly activated and exchange their information following the update rule of Algorithm 2. The remaining not activated nodes update their values using their own last two private values. Note that this implementation of mRK comes with the disadvantage that requires a central entity which choose the activate pair of nodes in each step.

Asynchronous pairwise gossip with common counter: The update rule of the nodes of the active pair in Algorithm 2 can be rewritten as follows:

 xk+1i=xki+β(xki−xk−1i)+ω2(xkj−xki)
 xk+1j=xkj+β(xkj−xk−1j)+ω2(xki−xkj)

In particular observe that in their update rule they have the expression which is precisely the update of all non activate nodes of the network. Thus if we assume that the nodes share a common counter that counts how many iterations take place and each node saves also the last iterate that it was activated then the algorithm can work in distributed fashion as follows:

Let us denote the number of total iterations (common counter) that becomes available to the activate nodes of each step as and let us define with the number of iterations between the current iterate and the last time that the node is picked (iteration ) then the update rule of the Algorithm 2 can be equivalently expressed as:

• Pick an edge at random following .

• The private values of the nodes are updated as follows:

 xk+1i=ik[xki+β(xki−xk−1i)]+ω2(xkj−xki)
 xk+1j=jk[xkj+β(xkj−xk−1j)]+ω2(xki−xkj)

; for any other node :

### Iii-B Connection with the accelerated gossip algorithm

In the randomized gossip literature there in one particular method closely related to our approach. It was first proposed in [5] and its analysis under strong conditions was presented in [17]. In this paper local memory is exploited by installing shift registers at each agent. In particular we are interested in the case of just two registers where the first stores the agent’s current value and the second the agent’s value before the latest update. The algorithm can be described as follows. Suppose that edge is chosen at time . Then,

• Node :

• Node :

• Any other node :

where . The method was analyzed in [17] under a strong assumption on the probabilities of choosing the pair of nodes that as the authors mentioned is unrealistic in practical scenarios, and for networks like the random geometric graphs. At this point we should highlight that the results presented in [22] hold for essentially any distribution and as a result such a problem cannot occur.

Note also that if we choose in the update rule of Algorithm 2, then our method is simplified to:

• Node :

• Node :

• Any other node :

In order to apply Theorem II.4, we need to assume that and which also means that . Thus for and momentum parameter it is easy to see that our approach is very similar to the shift-register algorithm. Both methods update the selected pair of nodes in the same way. However, in our case the other nodes of the network do not remain idle but instead also update their values using their own previous information.

Using the momentum matrix , the two algorithms above can be expressed as:

 xk+1=xk−ω2(xki−xkj)(ei−ej)+B(xk−xk−1). (14)

In particular, in our algorithm every element on the diagonal is equal to , while in [5] all values on the diagonal are zeros except for the two values .

###### Remark III.2

The shift register case and our algorithm can be seen as two limit cases of the update rule (14). In particular, the shift register method uses only two non-zero diagonal elements in , while our method has a full diagonal. We believe that further methods can be developed in the future by exploring the cases where more than two but not all elements of the diagonal matrix are non-zero. It might be possible to obtain better convergence if one carefully chooses these values based on the network topology. We leave this as an open problem for future research.

### Iii-C Randomized block Kaczmarz gossip with momentum

Recall that Theorem II.3 says how RBK (with no momentum and no relaxation) can be interpreted as a gossip algorithm. Now we use this result to explain how relaxed RBK with momentum works. Note that the update rule of RBK with momentum can be rewritten as follows:

 xk+1ω(I−A⊤HkA)xk+(1−ω)xk+β(xk−xk−1), (15)

where is the update rule of RBK (6).

Thus, in analogy to the simple RBK, in the step, a random set of edges is selected and connected components are formed as a result. This includes the connected components that belong to both sub-graph and also the singleton connected components (nodes outside the ). Let us define the set of the nodes that belong in the connected component at the step , such that and for any .

Using the update rule (15), Algorithm 3 shows how mRBK is updating the private values of the nodes of the network (see also Figure 2 for the graphical interpretation).

Note that in the update rule of mRBK the nodes that are not attached to a selected edge (do not belong in the sub-graph ) update their values via . By considering these nodes as singleton connected components their update rule is exactly the same with the nodes of sub-graph . This is easy to see as follows:

 xk+1ℓ = ω∑j∈Vkrxkj|Vkr|+(1−ω)xkℓ+β(xkℓ−xk−1ℓ) (17) = ωxkℓ+(1−ω)xkℓ+β(xkℓ−xk−1ℓ) = xkℓ+β(xkℓ−xk−1ℓ).
###### Remark III.3

In the special case that only one edge is selected in each iteration () the update rule of mRBK is simplified to the update rule of mRK. In this case the sub-graph is the pair of the two selected edges.

###### Remark III.4

In [20] it was shown that several existing gossip protocols for solving the average consensus problem are special cases of the simple RBK (Theorem II.3). For example two gossip algorithms that can be cast as special cases of the simple RBK are the path averaging proposed in [3] and the clique gossiping [19]. In path averaging, in each iteration a path of nodes is selected and its nodes update their values to their exact average (). In clique gossiping, the network is already divided into cliques and a through a random procedure a clique is activated and the nodes of it update their values to their exact average (). Since mRBK contains simple RBK as a special case for , we expect that these special protocols can also be accelerated with the addition of momentum parameter .

### Iii-D Mass preservation

One of the key properties of some of the most efficient randomized gossip algorithms is mass preservation. If a gossip algorithm has this property it means that the sum (and as a result the average) of the private values of the nodes remains fixed during the iterative procedure. That is, The original pairwise gossip algorithm proposed in [4] satisfied the mass preservation property, while exisiting accelerated gossip algorithms [5, 17] preserving a scaled sum.

In this section we show that the two proposed protocols presented above also have a mass preservation property. In particular, we prove mass preservation for the case of the block randomized gossip protocol (Algorithm 3) with momentum. This is sufficient since the Kaczmarz gossip with momentum (mRK) can be cast as special case.

###### Theorem III.1

Assume that . That is, the two registers of each node have the same initial value. Then for the Algorithms 2 and 3 we have for any and as a result, .

We prove the result for the more general Algorithm 3. Assume that in the step of the method connected components are formed. Let the set of the nodes of each connected component be so that and for any . Thus:

 ∑ni=1xk+1i=∑i∈Vk1xk+1i+⋯+∑i∈Vkqxk+1i (18)

Let us first focus, without loss of generality, on connected component and simplify the expression for the sum of its nodes: . By substituting this for all into the right hand side of (18) and from the fact that , we get Since , we have , and as a result for all .

## Iv Numerical Evaluation

We devote this section to experimentally evaluate the performance of the proposed gossip algorithms: mRK and mRBK. In particular we perform three experiments. In the first two we focus on the performance of the mRK, while in the last one on its block variant mRBK. In comparing the methods with their momentum variants we use the relative error measure where the starting vectors of values are taken to be always Gaussian vectors. For all of our experiments the horizontal axis represents the number of iterations. The networks used in the experiments are the cycle (ring graph), the 2-dimension grid and the randomized geometric graph (RGG) with radius . Code was written in Julia 0.6.3.

### Iv-a Impact of momentum parameter on mRK

Recall that in the simple pairwise gossip algorithm the two nodes that exchange information update their values to their exact average while all the other nodes remain idle. In our framework this method can be cast as special case of mRK when and . In this experiment we keep always the stepsize to be which means that the pair of the chosen nodes update their values to their exact average. We show that by choosing a suitable momentum parameter we can have faster convergence for all networks under study. See Figure 3 for more details.

### Iv-B Comparison with the Shift-Register

In this experiment we compare mRK with the shift register case when we choose the and in such a way in order to satisfy the connection establish in Section III-B. That is, we choose for any choice of . Observe that in all plots of Figure 4 our algorithm outperform the corresponding shift-register case.

### Iv-C Impact of momentum parameter on mRBK

In this experiment our goal is to show that the addition of momentum accelerates the RBK gossip algorithm proposed in [20]. Without loss of generality we choose the block size to be always equal to . That is the random matrix in the update rule of mRBK is always a column submatrix of the indetity matrix. Thus, in each iteration edges of the network are chosen to form the subgraph and the values of the nodes are updated according to Algorithm 3. Note that similar plots can be obtained for any choice of block size. We run all algorithms with fixed stepsize . It is obvious that by choosing a suitable momentum parameter we have faster convergence than when , for all networks under study. See Figure 5 for more details.

## V Conclusion and Future research

In this paper we present new accelerated randomized gossip algorithms using tools from numerical linear algebra and the area of randomized Kaczmarz methods for solving linear systems. In particular, using recently developed results on the stochastic reformulation of consistent linear systems we explain how stochastic heavy ball method for solving a specific quadratic stochastic optimization problem can be interpreted as gossip algorithm. To the best of our knowledge, it is the first time that such protocols are presented for average consensus problem. We believe that this work opens up many possible future venues for research. For example, using other Kaczmarz-type methods to solve particular linear systems we can obtain novel distributed protocols for average consensus. In addition, we believe that the gossip protocols presented in this work can be extended to the more general setting of distributed optimization where the goal is to minimize the average of convex functions in a distributed fashion.