1 Introduction
Distributed averaging is a fundamental problem in the area of distributed computing and multiagent systems [1, 2]. Randomized gossip algorithms are one of the most popular class of methods for solving it. The seminal 2006 paper of Boyd et al. [3] on randomized gossip algorithms motivated a flurry of subsequent research, and now gossip algorithms appear in many applications, including distributed data fusion in sensor networks [4], load balancing [5] and clock synchronization [6]. The development and design of efficient gossip algorithms was studied extensively in the last decade. For a survey of selected relevant work prior to 2010, we refer the reader to the survey [7]. For more recent results on randomized gossip algorithms we suggest [8, 9, 10, 11, 12, 13]. See also [14, 15, 16, 17].
In the literature of gossip algorithms, an important task is the design of fast and efficient algorithms. Surprisingly, to the best of our knowledge, there are no variants of gossip algorithms that converge to consensus with an accelerated linear rate. In this work, our focus is precisely this. We design two provably accelerated randomized gossip protocols which converge to consensus fast.
The average consensus problem. In the average consensus (AC) problem we are given an undirected connected network with node set and edges . Each node holds a local value . The goal of AC is for every node to compute the average of these private values, , in a distributed fashion. That is, the exchange of information can only occur between connected nodes (neighbors).
Main contributions. In this work, building upon a recent framework for the design and analysis of randomized gossip algorithms [11, 18], we present two novel and provably accelerated randomized gossip protocols where in each step all nodes of the network update their values using their own information but only a pair of them exchange messages. The accelerated convergence rates of the proposed protocols are obtained by establishing a connection with the area of accelerated randomized Kaczmarz methods for solving consistent linear systems.
To the best of our knowledge, our protocols are the first randomized gossip algorithms that converge to consensus with an accelerated linear rate. The theoretical results are validated via computational testing on typical network topologies.
Structure of the paper. Section 2 introduces important technical preliminaries and the necessary background for understanding of our methods. Two accelerated variants of the randomized Kaczmarz (RK) method for solving linear systems and their theoretical convergence results are described. In Section 3 we present the two provably accelerated gossip protocols, along with some remarks on their implementation. Numerical evaluation of the new gossip protocols is presented in Section 4. Finally, concluding remarks are given in Section 5.
Notation. The following notational conventions are used in this paper. We write . Boldface uppercase letters denote matrices;
is the identity matrix. By
we denote the solution set of the linear system , where and . By and we indicate the row and the column of matrix , respectively. Throughout the paper, is the projection of onto (that is, is the solution of the best approximation problem; see equation (2)). Withwe indicate the smallest nonzero eigenvalue of matrix
. and are used to denote the Euclidean norm and the Frobenius norm, respectively. Finally,represents the vector with the local values of the
nodes of the network at the iteration. Here, denotes the value of node at the iteration.2 Technical Preliminaries
In this section we present the connections between the randomized Kaczmarz methods for solving linear systems and the gossip algorithms for solving the AC problem, as discussed in more details in [11, 18]. In particular, we focus on the presentation of the two recently proposed accelerated variants of Kaczmarz methods and on their theoretical convergence analysis.
2.1 Kaczmarztype methods and gossip algorithms
Kaczmarztype methods are popular algorithms for solving linear systems with many equations. The randomized Kaczmarz method (RK) for solving consistent linear systems was first proposed and proved to converge with linear rate in [19]. This work triggered much research into developing and analyzing randomized linear solvers and several improved variants of RK have been proposed [20, 21, 22, 23, 24, 25, 26, 27, 28].
In particular, in its simplest version, RK works as follows; In each step, one row of matrix
is sampled with probability
and then is used to obtain the next iterate by following the update rule:(1) 
For the case of consistent linear systems, it was shown that RK and its variants solves the following problem (known as best approximation problem) [29, 30, 31] :
(2) 
where is the initial vector of the method.
In [11] it was shown how RK works as a gossip algorithm when applied to a special linear system encoding the underlying network. The following definition is used to describe the class of linear systems considered here.
Definition 2.1 ([11])
A linear system is called an “average consensus (AC) system” when all solutions satisfy that for all .
Many linear systems satisfy the above definition. In this work we focus on the case where and is the incidence matrix of (or its normalized form where ). In this case, the row of the system corresponding to edge directly encodes the constraint .
Since the right hand side of the above system is , the update rule of equation (1) simplifies to: In the case that the starting point is it can be shown that RK solves the average consensus probem and that the above udpate rule is equivalent with the pairwise randomized gossip algorithm of [3] (see [11] for more details). The convergence performance of RK for solving the best approximation problem (and as a result the average consensus problem) is described by the following theorem.
2.2 Accelerated Kaczmarz methods
There are two different but very similar ways to accelerate the randomized Kaczmarz method. The first paper that proves asymptotic convergence with an accelerated linear rate is [27]. The proof technique is similar to the framework developed by Nesterov in [32] for the acceleration of coordinate descent methods. In [33, 34] a modified version for the selection of the parameters was proposed and a nonasymptotic accelerated linear rate was established. In Algorithm 1, pseudocode of the Accelerated Kaczmarz method (AccRK) is presented where both variants can be cast as special cases, by choosing the parameters with the correct way.
2.3 Theoretical guarantees of AccRK
The two variants (Option 1 and Option 2) of AccRK are closely related, however their convergence analyses are different. Below we present the theoretical guarantees of the two options as presented in [27] and [34].
Theorem 2.3 ([27])
Let be the sequence of random iterates produced by Algorithm 1 with the Option 1 for the parameters. Let and define and . Then for any we have that:
Corollary 1 ([27])
Note that as , we have that . This means that the decrease of the right hand side is governed mainly by the behavior of the term in the denominator and as a result the method converge asymptotically with a decrease factor per iteration:
Thus, by choosing and for the case that is small, Algorithm 1 will have significantly faster convergence rate than RK. Note that the above convergence results hold for normalized matrices , that is matrices that have for any .
Theorem 2.4 ([34])
Let and assume that . Let be the iterates of Algorithm 1 with the Option 2 for the parameters. Then
where
The above result implies that Algorithm 1 converges linearly with rate , which translates to a total of iterations to bring the quantity below . It can be shown that , (Lemma 2 in [34]) where is as defined in (3). Thus, which means that the rate of AccRK (Option 2) is always better than that of the RK which (see Theorem 2.2) is equal to for normalized matrices ().
3 Accelerated randomized gossip algorithms
In the previous section we presented the complexity analysis guarantees of AccRK for solving consistent linear systems with normalized matrices. Now, let us explain how the two options of AccRK behave as gossip algorithms when they are used to solve the linear system where is the normalized incidence matrix of the network. That is, each row of can be represented as where (resp.) is the (resp. ) unit coordinate vector in .
By using this particular linear system, the expression that appears in steps 8 and 9 of AccRK takes the following form when the row is sampled:
Let be the Laplacian matrix of the network. For solving the above AC system (see Definition 2.1), the simple RK requires iterations to achieve expected accuracy . To understand the acceleration in the gossip framework this should be compared to the of AccRK (Option 1) and the of AccRK (Option 2).
Algorithm 2 describes in a single framework how the two variants of AccRK of Section 2.2 behave as gossip algorithms when are used to solve the above linear system. Note that each node of the network have two local registers to save the quantities and . In each step using these two values every node of the network (activated or not) computes the quantity . Then in the iteration the activated nodes and of the randomly selected edge exchange their values and and update the values of , and , as shown in Algorithm 2. The rest of the nodes use only their own to update the values of and without communicate with any other node.
The parameter
can be estimated by all nodes in a decentralized manner using the method described in
[35]. In order to implement this algorithm, we assume that all nodes have synchronized clocks and that they know the rate at which gossip updates are performed, so that inactive nodes also update their local values. This may not be feasible in all applications, but when it is possible (e.g., if nodes are equipped with inexpensive GPS receivers, or have reliable clocks) then they can benefit from the significant speedup achieved.
The selected node and node :

Any other node :
Related work on accelerated gossip algorithms: The idea of having gossip updates in a network with two registers in each node is not new. It was first proposed in [36] and its analysis under strong conditions was presented in [9]. There local memory is exploited by installing shift registers at each agent where the first register stores the agent’s current value and the second the agent’s value before the latest update. In [18], the Stochastic Heavy Ball (SHB) method is used for solving the AC problem and an accelerated method is proposed which was shown to be in practice faster than the algorithm of [36, 9]. [18] is the first paper that presents gossip algorithms where in each step all nodes of the network update their values but only a subset of them exchange their private values.
4 Numerical Evaluation
We devote this section to numerically evaluate the performance of the proposed accelerated gossip protocols. In all of our experiments we compare the simple RK (equivalent to pairwise gossip algorithm of [3]) the Stochastic Heavy Ball method (SHB) proposed in [18] and the AccRK (Algorithm 2) with the two options for the selection of the parameters presented in Section 2.2. In comparing the methods we use the relative error measure where the starting vector of values is taken to be always Gaussian vector. For all of our experiments the horizontal axis represents the number of iterations. The networks used in the experiments are the cycle (ring graph), the 2dimension grid and the randomized geometric graph (RGG) with radius . Code was written in Julia 0.6.3.
For the implementation of SHB we use the same parameters with the ones used in [18]. For the AccRK (Option 1) we use . Note that for all networks under study the two proposed protocols are faster than both the pairwise gossip algorithm of [3] and the SHB of [18].
5 Conclusion and Future Research
We proposed novel provably accelerated randomized gossip algorithms for solving the AC problem. Our approach is based on connections established between the gossip algorithms and the Kaczmarz methods for solving linear systems. We believe that many novel and efficient gossip protocols can be discovered using results from the literature of Kaczmarz methods either by using different AC linear systems or using other Kaczmarztype algorithms than the one presented in this manuscript. We speculate that the gossip algorithms presented in this work can be extended to the more general setting of minimizing the average of convex functions in a decentralized way [12]. While preparing this work we become aware of [37] where an accelerated gossip algorithm is developed for solving the dual of the best approximation problem (2) using the accelerated coordinate descent method of [38]. A comparison of our protocols and the algorithm of [37] is an ongoing research work.
References
 [1] Morris H DeGroot, “Reaching a consensus,” Journal of the American Statistical Association, vol. 69, no. 345, pp. 118–121, 1974.
 [2] John Tsitsiklis, Dimitri Bertsekas, and Michael Athans, “Distributed asynchronous deterministic and stochastic gradient optimization algorithms,” IEEE transactions on automatic control, vol. 31, no. 9, pp. 803–812, 1986.
 [3] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, “Randomized gossip algorithms,” IEEE Transactions on Information Theory, vol. 14, no. SI, pp. 2508–2530, 2006.
 [4] L. Xiao, S. Boyd, and S. Lall, “A scheme for robust distributed sensor fusion based on average consensus,” in Information Processing in Sensor Networks, 2005. IPSN 2005. Fourth International Symposium on. IEEE, 2005, pp. 63–70.
 [5] G. Cybenko, “Dynamic load balancing for distributed memory multiprocessors,” J. Parallel Distrib. Comput., vol. 7, no. 2, pp. 279–301, 1989.
 [6] N.M. Freris and A. Zouzias, “Fast distributed smoothing of relative measurements,” in Decision and Control (CDC), 2012 IEEE 51st Annual Conference on. IEEE, 2012, pp. 1411–1416.
 [7] A.G. Dimakis, S. Kar, J.M.F. Moura, M.G. Rabbat, and A. Scaglione, “Gossip algorithms for distributed signal processing,” Proceedings of the IEEE, vol. 98, no. 11, pp. 1847–1864, 2010.
 [8] A. Zouzias and N.M. Freris, “Randomized gossip algorithms for solving Laplacian systems,” in Control Conference (ECC), 2015 European. IEEE, 2015, pp. 1920–1925.
 [9] J. Liu, B.D.O. Anderson, M. Cao, and A.S. Morse, “Analysis of accelerated gossip algorithms,” Automatica, vol. 49, no. 4, pp. 873–883, 2013.
 [10] A. Olshevsky, “Linear time average consensus on fixed graphs and implications for decentralized optimization and multiagent control,” arXiv preprint arXiv:1411.4186, 2014.
 [11] N. Loizou and P. Richtárik, “A new perspective on randomized gossip algorithms,” in 4th IEEE Global Conference on Signal and Information Processing (GlobalSIP), 2016.
 [12] A. Nedić, A. Olshevsky, and M. G. Rabbat, “Network topology and communicationcomputation tradeoffs in decentralized optimization,” Proceedings of the IEEE, vol. 106, no. 5, pp. 953–976, 2018.
 [13] N. S. Aybat and M. Gürbüzbalaban, “Decentralized computation of effective resistances and acceleration of consensus algorithms,” in Signal and Information Processing (GlobalSIP), 2017 IEEE Global Conference on. IEEE, 2017, pp. 538–542.
 [14] A.G. Dimakis, A.D. Sarwate, and M.J. Wainwright, “Geographic gossip: Efficient averaging for sensor networks,” IEEE Trans. Signal Process., vol. 56, no. 3, pp. 1205–1216, 2008.
 [15] T.C. Aysal, M.E. Yildiz, A.D. Sarwate, and A. Scaglione, “Broadcast gossip algorithms for consensus,” IEEE Trans. Signal Process., vol. 57, no. 7, pp. 2748–2761, 2009.
 [16] A. Olshevsky and J.N. Tsitsiklis, “Convergence speed in distributed consensus and averaging,” SIAM J. Control Optim., vol. 48, no. 1, pp. 33–55, 2009.
 [17] F. Hanzely, J. Konečný, N. Loizou, P. Richtárik, and D. Grishchenko, “Privacy preserving randomized gossip algorithms,” arXiv preprint arXiv:1706.07636, 2017.
 [18] Nicolas Loizou and Peter Richtárik, “Accelerated gossip via stochastic heavy ball method,” Allerton Conference on Communication, Control, and Computing, [arXiv preprint arXiv:1809.08657], 2018.
 [19] T. Strohmer and R. Vershynin, “A randomized Kaczmarz algorithm with exponential convergence,” J. Fourier Anal. Appl., vol. 15, no. 2, pp. 262–278, 2009.
 [20] D. Needell, “Randomized Kaczmarz solver for noisy linear systems,” BIT Numerical Mathematics, vol. 50, no. 2, pp. 395–403, 2010.
 [21] D. Needell and J.A. Tropp, “Paved with good intentions: analysis of a randomized block Kaczmarz method,” Linear Algebra and its Applications, vol. 441, pp. 199–221, 2014.
 [22] Y.C. Eldar and D. Needell, “Acceleration of randomized Kaczmarz method via the Johnson–Lindenstrauss lemma,” Numerical Algorithms, vol. 58, no. 2, pp. 163–177, 2011.
 [23] A. Ma, D. Needell, and A. Ramdas, “Convergence properties of the randomized extended GaussSeidel and Kaczmarz methods,” SIAM Journal on Matrix Analysis and Applications, vol. 36, no. 4, pp. 1590–1604, 2015.
 [24] A. Zouzias and N.M. Freris, “Randomized extended Kaczmarz for solving least squares,” SIAM. J. Matrix Anal. & Appl., vol. 34, no. 2, pp. 773–793, 2013.
 [25] D. Needell, R. Zhao, and A. Zouzias, “Randomized block Kaczmarz method with projection for solving least squares,” Linear Algebra and its Applications, vol. 484, pp. 322–343, 2015.
 [26] F. Schöpfer and D.A. Lorenz, “Linear convergence of the randomized sparse Kaczmarz method,” arXiv preprint arXiv:1610.02889, 2016.
 [27] J. Liu and S. Wright, “An accelerated randomized Kaczmarz algorithm,” Mathematics of Computation, vol. 85, no. 297, pp. 153–178, 2016.
 [28] N. Loizou and P. Richtárik, “Linearly convergent stochastic heavy ball method for minimizing generalization error,” NIPSWorkshop on Optimization for Machine Learning [arXiv preprint arXiv:1710.10737], 2017.
 [29] R.M. Gower and P. Richtárik, “Randomized iterative methods for linear systems,” SIAM. J. Matrix Anal. & Appl., vol. 36, no. 4, pp. 1660–1690, 2015.
 [30] R.M. Gower and P. Richtárik, “Stochastic dual ascent for solving linear systems,” arXiv preprint arXiv:1512.06890, 2015.
 [31] N. Loizou and P. Richtárik, “Momentum and stochastic momentum for stochastic gradient, newton, proximal point and subspace descent methods,” arXiv preprint arXiv:1712.09677, 2017.
 [32] Y. Nesterov, “Efficiency of coordinate descent methods on hugescale optimization problems,” SIAM Journal on Optimization, vol. 22, no. 2, pp. 341–362, 2012.
 [33] S. Tu, S. Venkataraman, A.C. Wilson, A. Gittens, M.I. Jordan, and B. Recht, “Breaking locality accelerates block GaussSeidel,” in ICML, 2017.
 [34] R.M Gower, Filip H., P. Richtárik, and S. Stich, “Accelerated stochastic matrix inversion: general theory and speeding up bfgs rules for faster secondorder optimization,” arXiv preprint arXiv:1802.04079, 2018.

[35]
T. Charalambous, M.G. Rabbat, M. Johansson, and C.N. Hadjicostis,
“Distributed finitetime computation of digraph parameters: Left eigenvector, outdegree and spectrum,”
IEEE Trans. Control of Network Systems, vol. 3, no. 2, pp. 137–148, June 2016.  [36] M. Cao, D.A. Spielman, and E.M. Yeh, “Accelerated gossip algorithms for distributed computation,” in Proc. of the 44th Annual Allerton Conference on Communication, Control, and Computation, 2006, pp. 952–959.
 [37] H. Hendrikx, L. Massoulié, and F. Bach, “Accelerated decentralized optimization with local updates for smooth and strongly convex objectives,” arXiv preprint arXiv:1810.02660, 2018.
 [38] Y. Nesterov and S.U. Stich, “Efficiency of the accelerated coordinate descent method on structured optimization problems,” SIAM Journal on Optimization, vol. 27, no. 1, pp. 110–123, 2017.