Differentiable Self-Adaptive Learning Rate

by   Bozhou Chen, et al.

Learning rate adaptation is a popular topic in machine learning. Gradient Descent trains neural nerwork with a fixed learning rate. Learning rate adaptation is proposed to accelerate the training process through adjusting the step size in the training session. Famous works include Momentum, Adam and Hypergradient. Hypergradient is the most special one. Hypergradient achieved adaptation by calculating the derivative of learning rate with respect to cost function and utilizing gradient descent for learning rate. However, Hypergradient is still not perfect. In practice, Hypergradient fail to decrease training loss after learning rate adaptation with a large probability. Apart from that, evidence has been found that Hypergradient are not suitable for dealing with large datesets in the form of minibatch training. Most unfortunately, Hypergradient always fails to get a good accuracy on the validation dataset although it could reduce training loss to a very tiny value. To solve Hypergradient's problems, we propose a novel adaptation algorithm, where learning rate is parameter specific and internal structured. We conduct extensive experiments on multiple network models and datasets compared with various benchmark optimizers. It is shown that our algorithm can achieve faster and higher qualified convergence than those state-of-art optimizers.


page 7

page 9

page 10

page 11

page 12


Gradient descent revisited via an adaptive online learning rate

Any gradient descent optimization requires to choose a learning rate. Wi...

The Parameter-Less Self-Organizing Map algorithm

The Parameter-Less Self-Organizing Map (PLSOM) is a new neural network a...

Towards a Principled Learning Rate Adaptation for Natural Evolution Strategies

Natural Evolution Strategies (NES) is a promising framework for black-bo...

Neograd: gradient descent with an adaptive learning rate

Since its inception by Cauchy in 1847, the gradient descent algorithm ha...

CMA-ES with Learning Rate Adaptation: Can CMA-ES with Default Population Size Solve Multimodal and Noisy Problems?

The covariance matrix adaptation evolution strategy (CMA-ES) is one of t...

Feedback Control for Online Training of Neural Networks

Convolutional neural networks (CNNs) are commonly used for image classif...

The Golden Ratio of Learning and Momentum

Gradient descent has been a central training principle for artificial ne...

Please sign up or login with your details

Forgot password? Click here to reset