Speed learning on the fly

11/08/2015
by   Pierre-Yves Massé, et al.
0

The practical performance of online stochastic gradient descent algorithms is highly dependent on the chosen step size, which must be tediously hand-tuned in many applications. The same is true for more advanced variants of stochastic gradients, such as SAGA, SVRG, or AdaGrad. Here we propose to adapt the step size by performing a gradient descent on the step size itself, viewing the whole performance of the learning trajectory as a function of step size. Importantly, this adaptation can be computed online at little cost, without having to iterate backward passes over the full data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/13/2016

Barzilai-Borwein Step Size for Stochastic Gradient Descent

One of the major issues in stochastic gradient descent (SGD) methods is ...
research
02/24/2022

Cutting Some Slack for SGD with Adaptive Polyak Stepsizes

Tuning the step size of stochastic gradient descent is tedious and error...
research
05/30/2022

On Avoiding Local Minima Using Gradient Descent With Large Learning Rates

It has been widely observed in training of neural networks that when app...
research
11/02/2022

Gradient Descent and the Power Method: Exploiting their connection to find the leftmost eigen-pair and escape saddle points

This work shows that applying Gradient Descent (GD) with a fixed step si...
research
05/10/2018

Metatrace: Online Step-size Tuning by Meta-gradient Descent for Reinforcement Learning Control

Reinforcement learning (RL) has had many successes in both "deep" and "s...
research
05/02/2023

Random Function Descent

While gradient based methods are ubiquitous in machine learning, selecti...
research
07/09/2019

Finite Regret and Cycles with Fixed Step-Size via Alternating Gradient Descent-Ascent

Gradient descent is arguably one of the most popular online optimization...

Please sign up or login with your details

Forgot password? Click here to reset