Preconditioned Stochastic Gradient Descent

12/14/2015
by   Xi-Lin Li, et al.
0

Stochastic gradient descent (SGD) still is the workhorse for many practical problems. However, it converges slow, and can be difficult to tune. It is possible to precondition SGD to accelerate its convergence remarkably. But many attempts in this direction either aim at solving specialized problems, or result in significantly more complicated methods than SGD. This paper proposes a new method to estimate a preconditioner such that the amplitudes of perturbations of preconditioned stochastic gradient match that of the perturbations of parameters to be optimized in a way comparable to Newton method for deterministic optimization. Unlike the preconditioners based on secant equation fitting as done in deterministic quasi-Newton methods, which assume positive definite Hessian and approximate its inverse, the new preconditioner works equally well for both convex and non-convex optimizations with exact or noisy gradients. When stochastic gradient is used, it can naturally damp the gradient noise to stabilize SGD. Efficient preconditioner estimation methods are developed, and with reasonable simplifications, they are applicable to large scaled problems. Experimental results demonstrate that equipped with the new preconditioner, without any tuning effort, preconditioned SGD can efficiently solve many challenging problems like the training of a deep neural network or a recurrent neural network requiring extremely long term memories.

READ FULL TEXT
research
03/31/2021

Research of Damped Newton Stochastic Gradient Descent Method for Neural Network Training

First-order methods like stochastic gradient descent(SGD) are recently t...
research
06/14/2016

Recurrent neural network training with preconditioned stochastic gradient descent

This paper studies the performance of a recently proposed preconditioned...
research
03/26/2018

On the Performance of Preconditioned Stochastic Gradient Descent

This paper studies the performance of preconditioned stochastic gradient...
research
12/18/2017

On the Relationship Between the OpenAI Evolution Strategy and Stochastic Gradient Descent

Because stochastic gradient descent (SGD) has shown promise optimizing n...
research
11/08/2022

Black Box Lie Group Preconditioners for SGD

A matrix free and a low rank approximation preconditioner are proposed t...
research
11/03/2020

SGB: Stochastic Gradient Bound Method for Optimizing Partition Functions

This paper addresses the problem of optimizing partition functions in a ...
research
07/29/2019

Deep Gradient Boosting

Stochastic gradient descent (SGD) has been the dominant optimization met...

Please sign up or login with your details

Forgot password? Click here to reset