k-decay: A New Method For Learning Rate Schedule

04/13/2020
by   Tao Zhang, et al.
57

It is well known that the learning rate is the most important hyper-parameter on Deep Learning. Usually used learning rate schedule training neural networks. This paper puts forward a new method for learning rate schedule, named k-decay, which suitable for any derivable function to derived a new schedule function. On the new function control degree of decay by the new hyper-parameter k, while the original function is the special case at k = 1. This paper applied k-decay to polynomial function, cosine function and exponential function gives them the new function. In the paper, evaluate the k-decay method by the new polynomial function on CIFAR-10 and CIFAR-100 datasets with different neural networks (ResNet, Wide ResNet and DenseNet), the results improvements over the state-of-the-art results on most of them. Our experiments show that the performance of the model improves with the increase of k from 1.

READ FULL TEXT
research
06/05/2018

Stochastic Gradient Descent with Hyperbolic-Tangent Decay

Learning rate scheduler has been a critical issue in the deep neural net...
research
06/03/2015

Cyclical Learning Rates for Training Neural Networks

It is known that the learning rate is the most important hyper-parameter...
research
04/27/2019

Forget the Learning Rate, Decay Loss

In the usual deep neural network optimization process, the learning rate...
research
07/09/2021

REX: Revisiting Budgeted Training with an Improved Schedule

Deep learning practitioners often operate on a computational and monetar...
research
05/22/2021

AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on the Fly

The learning rate (LR) schedule is one of the most important hyper-param...
research
10/23/2020

Population Gradients improve performance across data-sets and architectures in object classification

The most successful methods such as ReLU transfer functions, batch norma...
research
10/17/2021

S-Cyc: A Learning Rate Schedule for Iterative Pruning of ReLU-based Networks

We explore a new perspective on adapting the learning rate (LR) schedule...

Please sign up or login with your details

Forgot password? Click here to reset