Learning with Random Learning Rates

10/02/2018
by   Léonard Blier, et al.
32

Hyperparameter tuning is a bothersome step in the training of deep learning models. One of the most sensitive hyperparameters is the learning rate of the gradient descent. We present the 'All Learning Rates At Once' (Alrao) optimization method for neural networks: each unit or feature in the network gets its own learning rate sampled from a random distribution spanning several orders of magnitude. This comes at practically no computational cost. Perhaps surprisingly, stochastic gradient descent (SGD) with Alrao performs close to SGD with an optimally tuned learning rate, for various architectures and problems. Alrao could save time when testing deep learning models: a range of models could be quickly assessed with Alrao, and the most promising models could then be trained more extensively. This text comes with a PyTorch implementation of the method, which can be plugged on an existing PyTorch model.

READ FULL TEXT

page 4

page 7

page 14

research
08/20/2019

Automatic and Simultaneous Adjustment of Learning Rate and Momentum for Stochastic Gradient Descent

Stochastic Gradient Descent (SGD) methods are prominent for training mac...
research
07/31/2020

Deep Reinforcement Learning using Cyclical Learning Rates

Deep Reinforcement Learning (DRL) methods often rely on the meticulous t...
research
08/08/2020

Why to "grow" and "harvest" deep learning models?

Current expectations from training deep learning models with gradient-ba...
research
08/07/2018

Robust Implicit Backpropagation

Arguably the biggest challenge in applying neural networks is tuning the...
research
07/06/2020

TDprop: Does Jacobi Preconditioning Help Temporal Difference Learning?

We investigate whether Jacobi preconditioning, accounting for the bootst...
research
09/05/2017

Stochastic Gradient Descent: Going As Fast As Possible But Not Faster

When applied to training deep neural networks, stochastic gradient desce...
research
08/26/2018

Deep Learning: Computational Aspects

In this article we review computational aspects of Deep Learning (DL). D...

Please sign up or login with your details

Forgot password? Click here to reset