Statistical Adaptive Stochastic Gradient Methods

02/25/2020
by   Pengchuan Zhang, et al.
0

We propose a statistical adaptive procedure called SALSA for automatically scheduling the learning rate (step size) in stochastic gradient methods. SALSA first uses a smoothed stochastic line-search procedure to gradually increase the learning rate, then automatically switches to a statistical method to decrease the learning rate. The line search procedure “warms up” the optimization process, reducing the need for expensive trial and error in setting an initial learning rate. The method for decreasing the learning rate is based on a new statistical test for detecting stationarity when using a constant step size. Unlike in prior work, our test applies to a broad class of stochastic gradient algorithms without modification. The combined method is highly robust and autonomous, and it matches the performance of the best hand-tuned learning rate schedules in our experiments on several deep learning tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/21/2019

Using Statistics to Automate Stochastic Optimization

Despite the development of numerous adaptive optimizers, tuning the lear...
research
10/02/2021

Fast Line Search for Multi-Task Learning

Multi-task learning is a powerful method for solving several tasks joint...
research
05/30/2022

Agnostic Physics-Driven Deep Learning

This work establishes that a physical system can perform statistical lea...
research
06/22/2021

Adaptive Learning Rate and Momentum for Training Deep Neural Networks

Recent progress on deep learning relies heavily on the quality and effic...
research
12/31/2019

A Dynamic Sampling Adaptive-SGD Method for Machine Learning

We propose a stochastic optimization method for minimizing loss function...
research
08/06/2023

Learning-Rate-Free Learning: Dissecting D-Adaptation and Probabilistic Line Search

This paper explores two recent methods for learning rate optimisation in...
research
10/17/2017

Convergence diagnostics for stochastic gradient descent with constant step size

Iterative procedures in stochastic optimization are typically comprised ...

Please sign up or login with your details

Forgot password? Click here to reset