Stochasticity of Deterministic Gradient Descent: Large Learning Rate for Multiscale Objective Function

02/14/2020
by   Lingkai Kong, et al.
13

This article suggests that deterministic Gradient Descent, which does not use any stochastic gradient approximation, can still exhibit stochastic behaviors. In particular, it shows that if the objective function exhibit multiscale behaviors, then in a large learning rate regime which only resolves the macroscopic but not the microscopic details of the objective, the deterministic GD dynamics can become chaotic and convergent not to a local minimizer but to a statistical distribution. A sufficient condition is also established for approximating this long-time statistical limit by a rescaled Gibbs distribution. Both theoretical and numerical demonstrations are provided, and the theoretical part relies on the construction of a stochastic map that uses bounded noise (as opposed to discretized diffusions).

READ FULL TEXT
research
10/25/2021

Accelerated Almost-Sure Convergence Rates for Nonconvex Stochastic Gradient Descent using Stochastic Learning Rates

Large-scale optimization problems require algorithms both effective and ...
research
05/04/2021

Stochastic gradient descent with noise of machine learning type. Part I: Discrete time analysis

Stochastic gradient descent (SGD) is one of the most popular algorithms ...
research
02/14/2023

Stochastic Modified Flows, Mean-Field Limits and Dynamics of Stochastic Gradient Descent

We propose new limiting dynamics for stochastic gradient descent in the ...
research
04/12/2021

Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate in Gradient Descent

We propose Meta-Regularization, a novel approach for the adaptive choice...
research
09/04/2021

On Faster Convergence of Scaled Sign Gradient Descent

Communication has been seen as a significant bottleneck in industrial ap...
research
09/05/2017

Stochastic Gradient Descent: Going As Fast As Possible But Not Faster

When applied to training deep neural networks, stochastic gradient desce...
research
02/28/2020

BigSurvSGD: Big Survival Data Analysis via Stochastic Gradient Descent

In many biomedical applications, outcome is measured as a “time-to-event...

Please sign up or login with your details

Forgot password? Click here to reset