DeepAI
Log In Sign Up

IntSGD: Floatless Compression of Stochastic Gradients

02/16/2021
by   Konstantin Mishchenko, et al.
17

We propose a family of lossy integer compressions for Stochastic Gradient Descent (SGD) that do not communicate a single float. This is achieved by multiplying floating-point vectors with a number known to every device and then rounding to an integer number. Our theory shows that the iteration complexity of SGD does not change up to constant factors when the vectors are scaled properly. Moreover, this holds for both convex and non-convex functions, with and without overparameterization. In contrast to other compression-based algorithms, ours preserves the convergence rate of SGD even on non-smooth problems. Finally, we show that when the data is significantly heterogeneous, it may become increasingly hard to keep the integers bounded and propose an alternative algorithm, IntDIANA, to solve this type of problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

05/30/2019

On the Convergence of Memory-Based Distributed SGD

Distributed stochastic gradient descent (DSGD) has been widely used for ...
06/10/2020

Random Reshuffling: Simple Analysis with Vast Improvements

Random Reshuffling (RR) is an algorithm for minimizing finite-sum functi...
10/04/2019

The Complexity of Finding Stationary Points with Stochastic Gradient Descent

We study the iteration complexity of stochastic gradient descent (SGD) f...
02/22/2022

Asynchronous Fully-Decentralized SGD in the Cluster-Based Model

This paper presents fault-tolerant asynchronous Stochastic Gradient Desc...
10/06/2022

Scaling up Stochastic Gradient Descent for Non-convex Optimisation

Stochastic gradient descent (SGD) is a widely adopted iterative method f...
06/16/2021

Robust Training in High Dimensions via Block Coordinate Geometric Median Descent

Geometric median (Gm) is a classical method in statistics for achieving ...