Adaptive Multi-level Hyper-gradient Descent

08/17/2020 ∙ by Renlong Jie, et al. ∙ 38

Adaptive learning rates can lead to faster convergence and better final performance for deep learning models. There are several widely known human-designed adaptive optimizers such as Adam and RMSProp, gradient based adaptive methods such as hyper-descent and L4, and meta learning approaches including learning to learn. However, the issue of balancing adaptiveness and over-parameterization is still a topic to be addressed. In this study, we investigate different levels of learning rate adaptation based on the framework of hyper-gradient descent, and further propose a method that adaptively learns the model parameters for combining different levels of adaptations. Meanwhile, we show the relationship between adding regularization on over-parameterized learning rates and building combinations of different levels of adaptive learning rates. The experiments on several network architectures including feed-forward networks, LeNet-5 and ResNet-34 show that the proposed multi-level adaptive approach can outperform baseline adaptive methods in a variety circumstances with statistical significance.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.