Hogwild! over Distributed Local Data Sets with Linearly Increasing Mini-Batch Sizes

10/27/2020
by   Marten van Dijk, et al.
0

Hogwild! implements asynchronous Stochastic Gradient Descent (SGD) where multiple threads in parallel access a common repository containing training data, perform SGD iterations and update shared state that represents a jointly learned (global) model. We consider big data analysis where training data is distributed among local data sets – and we wish to move SGD computations to local compute nodes where local data resides. The results of these local SGD computations are aggregated by a central "aggregator" which mimics Hogwild!. We show how local compute nodes can start choosing small mini-batch sizes which increase to larger ones in order to reduce communication cost (round interaction with the aggregator). We prove a tight and novel non-trivial convergence analysis for strongly convex problems which does not use the bounded gradient assumption as seen in many existing publications. The tightness is a consequence of our proofs for lower and upper bounds of the convergence rate, which show a constant factor difference. We show experimental results for plain convex and non-convex problems for biased and unbiased local data sets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/11/2020

STL-SGD: Speeding Up Local SGD with Stagewise Communication Period

Distributed parallel stochastic gradient descent algorithms are workhors...
research
09/07/2023

Convergence Analysis of Decentralized ASGD

Over the last decades, Stochastic Gradient Descent (SGD) has been intens...
research
05/10/2019

On the Computation and Communication Complexity of Parallel SGD with Dynamic Batch Sizes for Stochastic Non-Convex Optimization

For SGD based distributed stochastic optimization, computation complexit...
research
03/14/2022

The Role of Local Steps in Local SGD

We consider the distributed stochastic optimization problem where n agen...
research
12/27/2021

AET-SGD: Asynchronous Event-triggered Stochastic Gradient Descent

Communication cost is the main bottleneck for the design of effective di...
research
05/24/2023

Local SGD Accelerates Convergence by Exploiting Second Order Information of the Loss Function

With multiple iterations of updates, local statistical gradient descent ...
research
07/10/2023

Generalization Error of First-Order Methods for Statistical Learning with Generic Oracles

In this paper, we provide a novel framework for the analysis of generali...

Please sign up or login with your details

Forgot password? Click here to reset