DeepAI AI Chat
Log In Sign Up

Automatic Tuning of Stochastic Gradient Descent with Bayesian Optimisation

06/25/2020
by   Victor Picheny, et al.
0

Many machine learning models require a training procedure based on running stochastic gradient descent. A key element for the efficiency of those algorithms is the choice of the learning rate schedule. While finding good learning rates schedules using Bayesian optimisation has been tackled by several authors, adapting it dynamically in a data-driven way is an open question. This is of high practical importance to users that need to train a single, expensive model. To tackle this problem, we introduce an original probabilistic model for traces of optimisers, based on latent Gaussian processes and an auto-/regressive formulation, that flexibly adjusts to abrupt changes of behaviours induced by new learning rate values. As illustrated, this model is well-suited to tackle a set of problems: first, for the on-line adaptation of the learning rate for a cold-started run; then, for tuning the schedule for a set of similar tasks (in a classical BO setup), as well as warm-starting it for a new task.

READ FULL TEXT

page 1

page 2

page 3

page 4

03/14/2017

Online Learning Rate Adaptation with Hypergradient Descent

We introduce a general method for improving the convergence rate of grad...
09/21/2019

Using Statistics to Automate Stochastic Optimization

Despite the development of numerous adaptive optimizers, tuning the lear...
06/23/2020

On Compression Principle and Bayesian Optimization for Neural Networks

Finding methods for making generalizable predictions is a fundamental pr...
05/22/2017

Training Deep Networks without Learning Rates Through Coin Betting

Deep learning methods achieve state-of-the-art performance in many appli...
05/22/2021

AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on the Fly

The learning rate (LR) schedule is one of the most important hyper-param...
08/08/2020

Why to "grow" and "harvest" deep learning models?

Current expectations from training deep learning models with gradient-ba...