AsymptoticNG: A regularized natural gradient optimization algorithm with look-ahead strategy

12/24/2020
by   Zedong Tang, et al.
9

Optimizers that further adjust the scale of gradient, such as Adam, Natural Gradient (NG), etc., despite widely concerned and used by the community, are often found poor generalization performance, compared with Stochastic Gradient Descent (SGD). They tend to converge excellently at the beginning of training but are weak at the end. An immediate idea is to complement the strengths of these algorithms with SGD. However, a truncated replacement of optimizer often leads to a crash of the update pattern, and new algorithms often require many iterations to stabilize their search direction. Driven by this idea and to address this problem, we design and present a regularized natural gradient optimization algorithm with look-ahead strategy, named asymptotic natural gradient (ANG). According to the total iteration step, ANG dynamic assembles NG and Euclidean gradient, and updates parameters along the new direction using the intensity of NG. Validation experiments on CIFAR10 and CIFAR100 data sets show that ANG can update smoothly and stably at the second-order speed, and achieve better generalization performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2017

Improving Generalization Performance by Switching from Adam to SGD

Despite superior training outcomes, adaptive optimization methods such a...
research
07/19/2019

Lookahead Optimizer: k steps forward, 1 step back

The vast majority of successful deep neural networks are trained using v...
research
04/11/2020

Exploit Where Optimizer Explores via Residuals

To train neural networks faster, many research efforts have been devoted...
research
02/20/2020

Bounding the expected run-time of nonconvex optimization with early stopping

This work examines the convergence of stochastic gradient-based optimiza...
research
06/30/2023

Resetting the Optimizer in Deep RL: An Empirical Study

We focus on the task of approximating the optimal value function in deep...
research
07/29/2019

Deep Gradient Boosting

Stochastic gradient descent (SGD) has been the dominant optimization met...
research
05/05/2022

LAWS: Look Around and Warm-Start Natural Gradient Descent for Quantum Neural Networks

Variational quantum algorithms (VQAs) have recently received significant...

Please sign up or login with your details

Forgot password? Click here to reset