AdaSGD: Bridging the gap between SGD and Adam

06/30/2020
by   Jiaxuan Wang, et al.
0

In the context of stochastic gradient descent(SGD) and adaptive moment estimation (Adam),researchers have recently proposed optimization techniques that transition from Adam to SGD with the goal of improving both convergence and generalization performance. However, precisely how each approach trades off early progress and generalization is not well understood; thus, it is unclear when or even if, one should transition from one approach to the other. In this work, by first studying the convex setting, we identify potential contributors to observed differences in performance between SGD and Adam. In particular,we provide theoretical insights for when and why Adam outperforms SGD and vice versa. We ad-dress the performance gap by adapting a single global learning rate for SGD, which we refer to as AdaSGD. We justify this proposed approach with empirical analyses in non-convex settings. On several datasets that span three different domains,we demonstrate how AdaSGD combines the benefits of both SGD and Adam, eliminating the need for approaches that transition from Adam to SGD.

READ FULL TEXT
research
05/30/2023

Shuffle SGD is Always Better than SGD: Improved Analysis of SGD with Arbitrary Data Orders

Stochastic Gradient Descent (SGD) algorithms are widely used in optimizi...
research
06/13/2022

On the Convergence to a Global Solution of Shuffling-Type Gradient Algorithms

Stochastic gradient descent (SGD) algorithm is the method of choice in m...
research
06/06/2012

No More Pesky Learning Rates

The performance of stochastic gradient descent (SGD) depends critically ...
research
06/28/2022

Studying Generalization Through Data Averaging

The generalization of machine learning models has a complex dependence o...
research
11/08/2019

MindTheStep-AsyncPSGD: Adaptive Asynchronous Parallel Stochastic Gradient Descent

Stochastic Gradient Descent (SGD) is very useful in optimization problem...
research
10/10/2021

Frequency-aware SGD for Efficient Embedding Learning with Provable Benefits

Embedding learning has found widespread applications in recommendation s...
research
06/25/2022

Topology-aware Generalization of Decentralized SGD

This paper studies the algorithmic stability and generalizability of dec...

Please sign up or login with your details

Forgot password? Click here to reset