Adaptive Multi-level Hyper-gradient Descent

08/17/2020
by   Renlong Jie, et al.
38

Adaptive learning rates can lead to faster convergence and better final performance for deep learning models. There are several widely known human-designed adaptive optimizers such as Adam and RMSProp, gradient based adaptive methods such as hyper-descent and L4, and meta learning approaches including learning to learn. However, the issue of balancing adaptiveness and over-parameterization is still a topic to be addressed. In this study, we investigate different levels of learning rate adaptation based on the framework of hyper-gradient descent, and further propose a method that adaptively learns the model parameters for combining different levels of adaptations. Meanwhile, we show the relationship between adding regularization on over-parameterized learning rates and building combinations of different levels of adaptive learning rates. The experiments on several network architectures including feed-forward networks, LeNet-5 and ResNet-34 show that the proposed multi-level adaptive approach can outperform baseline adaptive methods in a variety circumstances with statistical significance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/12/2021

Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate in Gradient Descent

We propose Meta-Regularization, a novel approach for the adaptive choice...
research
04/20/2023

Angle based dynamic learning rate for gradient descent

In our work, we propose a novel yet simple approach to obtain an adaptiv...
research
05/27/2022

Incorporating the Barzilai-Borwein Adaptive Step Size into Sugradient Methods for Deep Network Training

In this paper, we incorporate the Barzilai-Borwein step size into gradie...
research
08/08/2020

Why to "grow" and "harvest" deep learning models?

Current expectations from training deep learning models with gradient-ba...
research
03/05/2022

Meta Mirror Descent: Optimiser Learning for Fast Convergence

Optimisers are an essential component for training machine learning mode...
research
07/18/2021

A New Adaptive Gradient Method with Gradient Decomposition

Adaptive gradient methods, especially Adam-type methods (such as Adam, A...

Please sign up or login with your details

Forgot password? Click here to reset