Stochastic gradient descent with random learning rate

03/15/2020
by   Daniele Musso, et al.
0

We propose to optimize neural networks with a uniformly-distributed random learning rate. The associated stochastic gradient descent algorithm can be approximated by continuous stochastic equations and analyzed with the Fokker-Planck formalism. In the small learning rate approximation, the training process is characterized by an effective temperature which depends on the average learning rate, the mini-batch size and the momentum of the optimization algorithm. By comparing the random learning rate protocol with cyclic and constant protocols, we suggest that the random choice is generically the best strategy in the small learning rate regime, yielding better regularization without extra computational cost. We provide supporting evidence through experiments on both shallow, fully-connected and deep, convolutional neural networks for image classification on the MNIST and CIFAR10 datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/01/2023

QLAB: Quadratic Loss Approximation-Based Optimal Learning Rate for Deep Learning

We propose a learning rate adaptation scheme, called QLAB, for descent o...
research
12/15/2016

Coupling Adaptive Batch Sizes with Learning Rates

Mini-batch stochastic gradient descent and variants thereof have become ...
research
02/17/2023

On Equivalent Optimization of Machine Learning Methods

At the core of many machine learning methods resides an iterative optimi...
research
02/13/2022

Reverse Back Propagation to Make Full Use of Derivative

The development of the back-propagation algorithm represents a landmark ...
research
12/20/2022

Normalized Stochastic Gradient Descent Training of Deep Neural Networks

In this paper, we introduce a novel optimization algorithm for machine l...
research
05/18/2020

Parsimonious Computing: A Minority Training Regime for Effective Prediction in Large Microarray Expression Data Sets

Rigorous mathematical investigation of learning rates used in back-propa...
research
05/13/2023

Depth Dependence of μP Learning Rates in ReLU MLPs

In this short note we consider random fully connected ReLU networks of w...

Please sign up or login with your details

Forgot password? Click here to reset