ACMo: Angle-Calibrated Moment Methods for Stochastic Optimization

06/12/2020
by   Xunpeng Huang, et al.
0

Due to its simplicity and outstanding ability to generalize, stochastic gradient descent (SGD) is still the most widely used optimization method despite its slow convergence. Meanwhile, adaptive methods have attracted rising attention of optimization and machine learning communities, both for the leverage of life-long information and for the profound and fundamental mathematical theory. Taking the best of both worlds is the most exciting and challenging question in the field of optimization for machine learning. Along this line, we revisited existing adaptive gradient methods from a novel perspective, refreshing understanding of second moments. Our new perspective empowers us to attach the properties of second moments to the first moment iteration, and to propose a novel first moment optimizer, Angle-Calibrated Moment method (). Our theoretical results show that is able to achieve the same convergence rate as mainstream adaptive methods. Furthermore, extensive experiments on CV and NLP tasks demonstrate that has a comparable convergence to SOTA Adam-type optimizers, and gains a better generalization performance in most cases.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/01/2018

Optimal Adaptive and Accelerated Stochastic Gradient Descent

Stochastic gradient descent (Sgd) methods are the most powerful optimiza...
research
06/12/2020

Adaptive Gradient Methods Can Be Provably Faster than SGD after Finite Epochs

Adaptive gradient methods have attracted much attention of machine learn...
research
04/11/2018

Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

In several recently proposed stochastic optimization methods (e.g. RMSPr...
research
06/22/2021

Adapting Stepsizes by Momentumized Gradients Improves Optimization and Generalization

Adaptive gradient methods, such as Adam, have achieved tremendous succes...
research
06/02/2023

Leveraging the Triple Exponential Moving Average for Fast-Adaptive Moment Estimation

Network optimization is a crucial step in the field of deep learning, as...
research
02/24/2019

Rapidly Adapting Moment Estimation

Adaptive gradient methods such as Adam have been shown to be very effect...
research
01/26/2021

Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization

We introduce MADGRAD, a novel optimization method in the family of AdaGr...

Please sign up or login with your details

Forgot password? Click here to reset