Scaling the Wild: Decentralizing Hogwild!-style Shared-memory SGD

03/13/2022
by   Bapi Chatterjee, et al.
1

Powered by the simplicity of lock-free asynchrony, Hogwilld! is a go-to approach to parallelize SGD over a shared-memory setting. Despite its popularity and concomitant extensions, such as PASSM+ wherein concurrent processes update a shared model with partitioned gradients, scaling it to decentralized workers has surprisingly been relatively unexplored. To our knowledge, there is no convergence theory of such methods, nor systematic numerical comparisons evaluating speed-up. In this paper, we propose an algorithm incorporating decentralized distributed memory computing architecture with each node running multiprocessing parallel shared-memory SGD itself. Our scheme is based on the following algorithmic tools and features: (a) asynchronous local gradient updates on the shared-memory of workers, (b) partial backpropagation, and (c) non-blocking in-place averaging of the local models. We prove that our method guarantees ergodic convergence rates for non-convex objectives. On the practical side, we show that the proposed method exhibits improved throughput and competitive accuracy for standard image classification benchmarks on the CIFAR-10, CIFAR-100, and Imagenet datasets. Our code is available at https://github.com/bapi/LPP-SGD.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/25/2022

Topology-aware Generalization of Decentralized SGD

This paper studies the algorithmic stability and generalizability of dec...
research
10/21/2019

Communication Efficient Decentralized Training with Multiple Local Updates

Communication efficiency plays a significant role in decentralized optim...
research
03/23/2020

A Unified Theory of Decentralized SGD with Changing Topology and Local Updates

Decentralized stochastic optimization methods have gained a lot of atten...
research
03/23/2018

The Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory

Stochastic Gradient Descent (SGD) is a fundamental algorithm in machine ...
research
10/01/2019

SlowMo: Improving Communication-Efficient Distributed SGD with Slow Momentum

Distributed optimization is essential for training large models on large...
research
09/24/2019

Gap Aware Mitigation of Gradient Staleness

Cloud computing is becoming increasingly popular as a platform for distr...
research
02/22/2022

Asynchronous Fully-Decentralized SGD in the Cluster-Based Model

This paper presents fault-tolerant asynchronous Stochastic Gradient Desc...

Please sign up or login with your details

Forgot password? Click here to reset