Adaptive learning rates and parallelization for stochastic, sparse, non-smooth gradients

01/16/2013
by   Tom Schaul, et al.
0

Recent work has established an empirically successful framework for adapting learning rates for stochastic gradient descent (SGD). This effectively removes all needs for tuning, while automatically reducing learning rates over time on stationary problems, and permitting learning rates to grow appropriately in non-stationary tasks. Here, we extend the idea in three directions, addressing proper minibatch parallelization, including reweighted updates for sparse or orthogonal gradients, improving robustness on non-smooth loss functions, in the process replacing the diagonal Hessian estimation procedure that may not always be available by a robust finite-difference approximation. The final algorithm integrates all these components, has linear complexity and is hyper-parameter free.

READ FULL TEXT
research
06/06/2012

No More Pesky Learning Rates

The performance of stochastic gradient descent (SGD) depends critically ...
research
06/10/2022

Stochastic Zeroth order Descent with Structured Directions

We introduce and analyze Structured Stochastic Zeroth order Descent (S-S...
research
05/21/2012

Stochastic Smoothing for Nonsmooth Minimizations: Accelerating SGD by Exploiting Structure

In this work we consider the stochastic minimization of nonsmooth convex...
research
08/27/2020

Understanding and Detecting Convergence for Stochastic Gradient Descent with Momentum

Convergence detection of iterative stochastic optimization methods is of...
research
01/18/2022

AdaTerm: Adaptive T-Distribution Estimated Robust Moments towards Noise-Robust Stochastic Gradient Optimizer

As the problems to be optimized with deep learning become more practical...
research
07/17/2019

Meta-descent for Online, Continual Prediction

This paper investigates different vector step-size adaptation approaches...
research
02/17/2021

Consistent Lock-free Parallel Stochastic Gradient Descent for Fast and Stable Convergence

Stochastic gradient descent (SGD) is an essential element in Machine Lea...

Please sign up or login with your details

Forgot password? Click here to reset