On the Computation and Communication Complexity of Parallel SGD with Dynamic Batch Sizes for Stochastic Non-Convex Optimization

05/10/2019
by   Hao Yu, et al.
0

For SGD based distributed stochastic optimization, computation complexity, measured by the convergence rate in terms of the number of stochastic gradient calls, and communication complexity, measured by the number of inter-node communication rounds, are two most important performance metrics. The classical data-parallel implementation of SGD over N workers can achieve linear speedup of its convergence rate but incurs an inter-node communication round at each batch. We study the benefit of using dynamically increasing batch sizes in parallel SGD for stochastic non-convex optimization by charactering the attained convergence rate and the required number of communication rounds. We show that for stochastic non-convex optimization under the P-L condition, the classical data-parallel SGD with exponentially increasing batch sizes can achieve the fastest known O(1/(NT)) convergence with linear speedup using only (T) communication rounds. For general stochastic non-convex optimization, we propose a Catalyst-like algorithm to achieve the fastest known O(1/√(NT)) convergence with only O(√(NT)(T/N)) communication rounds.

READ FULL TEXT
research
07/17/2018

Parallel Restarted SGD for Non-Convex Optimization with Faster Convergence and Less Communication

For large scale non-convex stochastic optimization, parallel mini-batch ...
research
03/14/2022

The Role of Local Steps in Local SGD

We consider the distributed stochastic optimization problem where n agen...
research
02/02/2021

The Min-Max Complexity of Distributed Stochastic Convex Optimization with Intermittent Communication

We resolve the min-max complexity of distributed stochastic convex optim...
research
10/26/2020

Stochastic Optimization with Laggard Data Pipelines

State-of-the-art optimization is steadily shifting towards massively par...
research
10/27/2020

Hogwild! over Distributed Local Data Sets with Linearly Increasing Mini-Batch Sizes

Hogwild! implements asynchronous Stochastic Gradient Descent (SGD) where...
research
10/07/2021

Permutation Compressors for Provably Faster Distributed Nonconvex Optimization

We study the MARINA method of Gorbunov et al (2021) – the current state-...
research
07/20/2018

signProx: One-Bit Proximal Algorithm for Nonconvex Stochastic Optimization

Stochastic gradient descent (SGD) is one of the most widely used optimiz...

Please sign up or login with your details

Forgot password? Click here to reset