Quasi-potential as an implicit regularizer for the loss function in the stochastic gradient descent

01/18/2019
by   Wenqing Hu, et al.
0

We interpret the variational inference of the Stochastic Gradient Descent (SGD) as minimizing a new potential function named the quasi-potential. We analytically construct the quasi-potential function in the case when the loss function is convex and admits only one global minimum point. We show in this case that the quasi-potential function is related to the noise covariance structure of SGD via a partial differential equation of Hamilton-Jacobi type. This relation helps us to show that anisotropic noise leads to faster escape than isotropic noise. We then consider the dynamics of SGD in the case when the loss function is non-convex and admits several different local minima. In this case, we demonstrate an example that shows how the noise covariance structure plays a role in "implicit regularization", a phenomenon in which SGD favors some particular local minimum points. This is done through the relation between the noise covariance structure and the quasi-potential function. Our analysis is based on Large Deviations Theory (LDT), and they are validated by numerical experiments.

READ FULL TEXT
research
12/14/2021

Non Asymptotic Bounds for Optimization via Online Multiplicative Stochastic Gradient Descent

The gradient noise of Stochastic Gradient Descent (SGD) is considered to...
research
11/07/2021

Quasi-potential theory for escape problem: Quantitative sharpness effect on SGD's escape from local minima

We develop a quantitative theory on an escape problem of a stochastic gr...
research
10/30/2017

Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks

Stochastic gradient descent (SGD) is widely believed to perform implicit...
research
10/13/2021

What Happens after SGD Reaches Zero Loss? –A Mathematical Framework

Understanding the implicit bias of Stochastic Gradient Descent (SGD) is ...
research
12/28/2018

A continuous-time analysis of distributed stochastic gradient

Synchronization in distributed networks of nonlinear dynamical systems p...
research
07/27/2020

Stochastic Gradient Descent applied to Least Squares regularizes in Sobolev spaces

We study the behavior of stochastic gradient descent applied to Ax -b _2...
research
07/07/2019

Quantitative W_1 Convergence of Langevin-Like Stochastic Processes with Non-Convex Potential State-Dependent Noise

We prove quantitative convergence rates at which discrete Langevin-like ...

Please sign up or login with your details

Forgot password? Click here to reset