Step-size Adaptation Using Exponentiated Gradient Updates

01/31/2022
by   Ehsan Amid, et al.
3

Optimizers like Adam and AdaGrad have been very successful in training large-scale neural networks. Yet, the performance of these methods is heavily dependent on a carefully tuned learning rate schedule. We show that in many large-scale applications, augmenting a given optimizer with an adaptive tuning method of the step-size greatly improves the performance. More precisely, we maintain a global step-size scale for the update as well as a gain factor for each coordinate. We adjust the global scale based on the alignment of the average gradient and the current gradient vectors. A similar approach is used for updating the local gain factors. This type of step-size scale tuning has been done before with gradient descent updates. In this paper, we update the step-size scale and the gain variables with exponentiated gradient updates instead. Experimentally, we show that our approach can achieve compelling accuracy on standard models without using any specially tuned learning rate schedule. We also show the effectiveness of our approach for quickly adapting to distribution shifts in the data during training.

READ FULL TEXT
research
02/20/2019

LOSSGRAD: automatic learning rate in gradient descent

In this paper, we propose a simple, fast and easy to implement algorithm...
research
04/01/2022

Learning to Accelerate by the Methods of Step-size Planning

Gradient descent is slow to converge for ill-conditioned problems and no...
research
09/11/2019

An Implicit Form of Krasulina's k-PCA Update without the Orthonormality Constraint

We shed new insights on the two commonly used updates for the online k-P...
research
08/01/2022

Dynamic Batch Adaptation

Current deep learning adaptive optimizer methods adjust the step magnitu...
research
08/15/2019

Examining the Use of Temporal-Difference Incremental Delta-Bar-Delta for Real-World Predictive Knowledge Architectures

Predictions and predictive knowledge have seen recent success in improvi...
research
11/03/2020

AdaDGS: An adaptive black-box optimization method with a nonlocal directional Gaussian smoothing gradient

The local gradient points to the direction of the steepest slope in an i...
research
01/17/2020

ADAMT: A Stochastic Optimization with Trend Correction Scheme

Adam-type optimizers, as a class of adaptive moment estimation methods w...

Please sign up or login with your details

Forgot password? Click here to reset