DeepAI AI Chat
Log In Sign Up

Lookahead Optimizer: k steps forward, 1 step back

07/19/2019
by   Michael R. Zhang, et al.
UNIVERSITY OF TORONTO
5

The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms. Recent attempts to improve SGD can be broadly categorized into two approaches: (1) adaptive learning rate schemes, such as AdaGrad and Adam, and (2) accelerated schemes, such as heavy-ball and Nesterov momentum. In this paper, we propose a new optimization algorithm, Lookahead, that is orthogonal to these previous approaches and iteratively updates two sets of weights. Intuitively, the algorithm chooses a search direction by looking ahead at the sequence of "fast weights" generated by another optimizer. We show that Lookahead improves the learning stability and lowers the variance of its inner optimizer with negligible computation and memory cost. We empirically demonstrate Lookahead can significantly improve the performance of SGD and Adam, even with their default hyperparameter settings on ImageNet, CIFAR-10/100, neural machine translation, and Penn Treebank.

READ FULL TEXT
10/18/2021

Training Deep Neural Networks with Adaptive Momentum Inspired by the Quadratic Optimization

Heavy ball momentum is crucial in accelerating (stochastic) gradient-bas...
10/15/2020

AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients

Most popular optimizers for deep learning can be broadly categorized as ...
12/24/2020

AsymptoticNG: A regularized natural gradient optimization algorithm with look-ahead strategy

Optimizers that further adjust the scale of gradient, such as Adam, Natu...
09/21/2017

Neural Optimizer Search with Reinforcement Learning

We present an approach to automate the process of discovering optimizati...
12/04/2019

Domain-independent Dominance of Adaptive Methods

From a simplified analysis of adaptive methods, we derive AvaGrad, a new...
09/29/2022

NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizers

Classical machine learning models such as deep neural networks are usual...

Code Repositories

lookahead.pytorch

lookahead optimizer (Lookahead Optimizer: k steps forward, 1 step back) for pytorch


view repo

lookahead

Implementation for the Lookahead Optimizer.


view repo

keras_lookahead

lookahead optimizer for keras


view repo

BDC2019

高校赛2019 文本点击预测


view repo

Optimizer-PyTorch

Package of Optimizer implemented with PyTorch .


view repo