Leveraging the Triple Exponential Moving Average for Fast-Adaptive Moment Estimation

06/02/2023
by   Roi Peleg, et al.
0

Network optimization is a crucial step in the field of deep learning, as it directly affects the performance of models in various domains such as computer vision. Despite the numerous optimizers that have been developed over the years, the current methods are still limited in their ability to accurately and quickly identify gradient trends, which can lead to sub-optimal network performance. In this paper, we propose a novel deep optimizer called Fast-Adaptive Moment Estimation (FAME), which for the first time estimates gradient moments using a Triple Exponential Moving Average (TEMA). Incorporating TEMA into the optimization process provides richer and more accurate information on data changes and trends, as compared to the standard Exponential Moving Average used in essentially all current leading adaptive optimization methods. Our proposed FAME optimizer has been extensively validated through a wide range of benchmarks, including CIFAR-10, CIFAR-100, PASCAL-VOC, MS-COCO, and Cityscapes, using 14 different learning architectures, six optimizers, and various vision tasks, including detection, classification and semantic understanding. The results demonstrate that our FAME optimizer outperforms other leading optimizers in terms of both robustness and accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/02/2023

Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers

Optimizer is an essential component for the success of deep learning, wh...
research
01/22/2021

Gravity Optimizer: a Kinematic Approach on Optimization in Deep Learning

We introduce Gravity, another algorithm for gradient-based optimization....
research
06/12/2020

ACMo: Angle-Calibrated Moment Methods for Stochastic Optimization

Due to its simplicity and outstanding ability to generalize, stochastic ...
research
12/04/2019

Domain-independent Dominance of Adaptive Methods

From a simplified analysis of adaptive methods, we derive AvaGrad, a new...
research
03/04/2020

Adaptive exponential power distribution with moving estimator for nonstationary time series

While standard estimation assumes that all datapoints are from probabili...
research
07/03/2020

Descending through a Crowded Valley – Benchmarking Deep Learning Optimizers

Choosing the optimizer is among the most crucial decisions of deep learn...
research
04/11/2018

Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

In several recently proposed stochastic optimization methods (e.g. RMSPr...

Please sign up or login with your details

Forgot password? Click here to reset