Adaptive Step-Size Methods for Compressed SGD

07/20/2022
by   Adarsh M. Subramaniam, et al.
0

Compressed Stochastic Gradient Descent (SGD) algorithms have been recently proposed to address the communication bottleneck in distributed and decentralized optimization problems, such as those that arise in federated machine learning. Existing compressed SGD algorithms assume the use of non-adaptive step-sizes(constant or diminishing) to provide theoretical convergence guarantees. Typically, the step-sizes are fine-tuned in practice to the dataset and the learning algorithm to provide good empirical performance. Such fine-tuning might be impractical in many learning scenarios, and it is therefore of interest to study compressed SGD using adaptive step-sizes. Motivated by prior work on adaptive step-size methods for SGD to train neural networks efficiently in the uncompressed setting, we develop an adaptive step-size method for compressed SGD. In particular, we introduce a scaling technique for the descent step in compressed SGD, which we use to establish order-optimal convergence rates for convex-smooth and strong convex-smooth objectives under an interpolation condition and for non-convex objectives under a strong growth condition. We also show through simulation examples that without this scaling, the algorithm can fail to converge. We present experimental results on deep neural networks for real-world datasets, and compare the performance of our proposed algorithm with previously proposed compressed SGD methods in literature, and demonstrate improved performance on ResNet-18, ResNet-34 and DenseNet architectures for CIFAR-100 and CIFAR-10 datasets at various levels of compression.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/19/2020

On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems

This paper analyzes the trajectories of stochastic gradient descent (SGD...
research
05/30/2023

BiSLS/SPS: Auto-tune Step Sizes for Stable Bi-level Optimization

The popularity of bi-level optimization (BO) in deep learning has spurre...
research
11/20/2020

On the Benefits of Multiple Gossip Steps in Communication-Constrained Decentralized Optimization

In decentralized optimization, it is common algorithmic practice to have...
research
03/05/2021

Second-order step-size tuning of SGD for non-convex optimization

In view of a direct and simple improvement of vanilla SGD, this paper pr...
research
12/03/2021

Distributed Adaptive Learning Under Communication Constraints

This work examines adaptive distributed learning strategies designed to ...
research
02/11/2022

The Power of Adaptivity in SGD: Self-Tuning Step Sizes with Unbounded Gradients and Affine Variance

We study convergence rates of AdaGrad-Norm as an exemplar of adaptive st...
research
02/05/2021

Last iterate convergence of SGD for Least-Squares in the Interpolation regime

Motivated by the recent successes of neural networks that have the abili...

Please sign up or login with your details

Forgot password? Click here to reset