Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers

07/02/2023
by   Yineng Chen, et al.
0

Optimizer is an essential component for the success of deep learning, which guides the neural network to update the parameters according to the loss on the training set. SGD and Adam are two classical and effective optimizers on which researchers have proposed many variants, such as SGDM and RAdam. In this paper, we innovatively combine the backward-looking and forward-looking aspects of the optimizer algorithm and propose a novel Admeta (A Double exponential Moving averagE To Adaptive and non-adaptive momentum) optimizer framework. For backward-looking part, we propose a DEMA variant scheme, which is motivated by a metric in the stock market, to replace the common exponential moving average scheme. While in the forward-looking part, we present a dynamic lookahead strategy which asymptotically approaches a set value, maintaining its speed at early stage and high convergence performance at final stage. Based on this idea, we provide two optimizer implementations, AdmetaR and AdmetaS, the former based on RAdam and the latter based on SGDM. Through extensive experiments on diverse tasks, we find that the proposed Admeta optimizer outperforms our base optimizers and shows advantages over recently proposed competitive optimizers. We also provide theoretical proof of these two algorithms, which verifies the convergence of our proposed Admeta.

READ FULL TEXT
research
07/30/2023

Efficient Federated Learning via Local Adaptive Amended Optimizer with Linear Speedup

Adaptive optimization has achieved notable success for distributed learn...
research
06/02/2023

Leveraging the Triple Exponential Moving Average for Fast-Adaptive Moment Estimation

Network optimization is a crucial step in the field of deep learning, as...
research
10/01/2019

SlowMo: Improving Communication-Efficient Distributed SGD with Slow Momentum

Distributed optimization is essential for training large models on large...
research
07/19/2019

Lookahead Optimizer: k steps forward, 1 step back

The vast majority of successful deep neural networks are trained using v...
research
04/27/2018

An improvement of the convergence proof of the ADAM-Optimizer

A common way to train neural networks is the Backpropagation. This algor...
research
12/29/2018

SPI-Optimizer: an integral-Separated PI Controller for Stochastic Optimization

To overcome the oscillation problem in the classical momentum-based opti...
research
08/11/2022

On the Pros and Cons of Momentum Encoder in Self-Supervised Visual Representation Learning

Exponential Moving Average (EMA or momentum) is widely used in modern se...

Please sign up or login with your details

Forgot password? Click here to reset