XGrad: Boosting Gradient-Based Optimizers With Weight Prediction

05/26/2023
by   Lei Guan, et al.
0

In this paper, we propose a general deep learning training framework XGrad which introduces weight prediction into the popular gradient-based optimizers to boost their convergence and generalization when training the deep neural network (DNN) models. In particular, ahead of each mini-batch training, the future weights are predicted according to the update rule of the used optimizer and are then applied to both the forward pass and backward propagation. In this way, during the whole training period, the optimizer always utilizes the gradients w.r.t. the future weights to update the DNN parameters, making the gradient-based optimizer achieve better convergence and generalization compared to the original optimizer without weight prediction. XGrad is rather straightforward to implement yet pretty effective in boosting the convergence of gradient-based optimizers and the accuracy of DNN models. Empirical results concerning the most three popular gradient-based optimizers including SGD with momentum, Adam, and AdamW demonstrate the effectiveness of our proposal. The experimental results validate that XGrad can attain higher model accuracy than the original optimizers when training the DNN models. The code of XGrad will be available at: https://github.com/guanleics/XGrad.

READ FULL TEXT

page 1

page 12

research
02/01/2023

Weight Prediction Boosts the Convergence of AdamW

In this paper, we introduce weight prediction into the AdamW optimizer t...
research
04/03/2020

Gradient Centralization: A New Optimization Technique for Deep Neural Networks

Optimization techniques are of great importance to effectively and effic...
research
06/13/2023

Lookaround Optimizer: k steps around, 1 step average

Weight Average (WA) is an active research topic due to its simplicity in...
research
06/19/2017

meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting

We propose a simple yet effective technique for neural network learning....
research
08/12/2021

Logit Attenuating Weight Normalization

Over-parameterized deep networks trained using gradient-based optimizers...
research
09/05/2019

Diversely Stale Parameters for Efficient Training of CNNs

The backpropagation algorithm is the most popular algorithm training neu...
research
05/22/2023

Adaptive Gradient Prediction for DNN Training

Neural network training is inherently sequential where the layers finish...

Please sign up or login with your details

Forgot password? Click here to reset