Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication
We consider decentralized stochastic optimization with the objective function (e.g. data samples for machine learning task) being distributed over n machines that can only communicate to their neighbors on a fixed communication graph. To reduce the communication bottleneck, the nodes compress (e.g. quantize or sparsify) their model updates. We cover both unbiased and biased compression operators with quality denoted by ω≤ 1 (ω=1 meaning no compression). We (i) propose a novel gossip-based stochastic gradient descent algorithm, CHOCO-SGD, that converges at rate O(1/(nT) + 1/(T δ^2 ω)^2) for strongly convex objectives, where T denotes the number of iterations and δ the eigengap of the connectivity matrix. Despite compression quality and network connectivity affecting the higher order terms, the first term in the rate, O(1/(nT)), is the same as for the centralized baseline with exact communication. We (ii) present a novel gossip algorithm, CHOCO-GOSSIP, for the average consensus problem that converges in time O(1/(δ^2ω) (1/ϵ)) for accuracy ϵ > 0. This is (up to our knowledge) the first gossip algorithm that supports arbitrary compressed messages for ω > 0 and still exhibits linear convergence. We (iii) show in experiments that both of our algorithms do outperform the respective state-of-the-art baselines and CHOCO-SGD can reduce communication by at least two orders of magnitudes.
READ FULL TEXT