DeepAI AI Chat
Log In Sign Up

PopSGD: Decentralized Stochastic Gradient Descent in the Population Model

10/27/2019
by   Giorgi Nadiradze, et al.
0

The population model is a standard way to represent large-scale decentralized distributed systems, in which agents with limited computational power interact in randomly chosen pairs, in order to collectively solve global computational tasks. In contrast with synchronous gossip models, nodes are anonymous, lack a common notion of time, and have no control over their scheduling. In this paper, we examine whether large-scale distributed optimization can be performed in this extremely restrictive setting. We introduce and analyze a natural decentralized variant of stochastic gradient descent (SGD), called PopSGD, in which every node maintains a local parameter, and is able to compute stochastic gradients with respect to this parameter. Every pair-wise node interaction performs a stochastic gradient step at each agent, followed by averaging of the two models. We prove that, under standard assumptions, SGD can converge even in this extremely loose, decentralized setting, for both convex and non-convex objectives. Moreover, surprisingly, in the former case, the algorithm can achieve linear speedup in the number of nodes n. Our analysis leverages a new technical connection between decentralized SGD and randomized load-balancing, which enables us to tightly bound the concentration of node parameters. We validate our analysis through experiments, showing that PopSGD can achieve convergence and speedup for large-scale distributed learning tasks in a supercomputing environment.

READ FULL TEXT

page 1

page 2

page 3

page 4

05/13/2020

SQuARM-SGD: Communication-Efficient Momentum SGD for Decentralized Optimization

In this paper, we consider the problem of communication-efficient decent...
05/15/2019

DoubleSqueeze: Parallel Stochastic Gradient Descent with Double-Pass Error-Compensated Compression

A standard approach in large scale machine learning is distributed stoch...
09/29/2017

Convergence Analysis of Distributed Stochastic Gradient Descent with Shuffling

When using stochastic gradient descent to solve large-scale machine lear...
01/16/2020

Elastic Consistency: A General Consistency Model for Distributed Stochastic Gradient Descent

Machine learning has made tremendous progress in recent years, with mode...
12/26/2017

Scaling GRPC Tensorflow on 512 nodes of Cori Supercomputer

We explore scaling of the standard distributed Tensorflow with GRPC prim...
08/12/2019

Taming Unbalanced Training Workloads in Deep Learning with Partial Collective Operations

Load imbalance pervasively exists in distributed deep learning training ...