A Stochastic Gradient Method with Biased Estimation for Faster Nonconvex Optimization

05/13/2019
by   Jia Bi, et al.
0

A number of optimization approaches have been proposed for optimizing nonconvex objectives (e.g. deep learning models), such as batch gradient descent, stochastic gradient descent and stochastic variance reduced gradient descent. Theory shows these optimization methods can converge by using an unbiased gradient estimator. However, in practice biased gradient estimation can allow more efficient convergence to the vicinity since an unbiased approach is computationally more expensive. To produce fast convergence there are two trade-offs of these optimization strategies which are between stochastic/batch, and between biased/unbiased. This paper proposes an integrated approach which can control the nature of the stochastic element in the optimizer and can balance the trade-off of estimator between the biased and unbiased by using a hyper-parameter. It is shown theoretically and experimentally that this hyper-parameter can be configured to provide an effective balance to improve the convergence rate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/19/2021

A Variance Controlled Stochastic Method with Biased Estimation for Faster Non-convex Optimization

In this paper, we proposed a new technique, variance controlled stochast...
research
07/31/2018

Stochastic Gradient Descent with Biased but Consistent Gradient Estimators

Stochastic gradient descent (SGD), which dates back to the 1950s, is one...
research
03/21/2019

SVAG: Unified Convergence Results for SAG-SAGA Interpolation with Stochastic Variance Adjusted Gradient Descent

We analyze SVAG, a variance reduced stochastic gradient method with SAG ...
research
05/14/2013

Estimating or Propagating Gradients Through Stochastic Neurons

Stochastic neurons can be useful for a number of reasons in deep learnin...
research
04/19/2022

A stochastic Stein Variational Newton method

Stein variational gradient descent (SVGD) is a general-purpose optimizat...
research
08/26/2021

The Number of Steps Needed for Nonconvex Optimization of a Deep Learning Optimizer is a Rational Function of Batch Size

Recently, convergence as well as convergence rate analyses of deep learn...
research
01/23/2023

On the Convergence of the Gradient Descent Method with Stochastic Fixed-point Rounding Errors under the Polyak-Lojasiewicz Inequality

When training neural networks with low-precision computation, rounding e...

Please sign up or login with your details

Forgot password? Click here to reset