On the Stochastic Gradient Descent and Inverse Variance-flatness Relation in Artificial Neural Networks

07/11/2022
by   Xia Xiong, et al.
0

Stochastic gradient descent (SGD), a widely used algorithm in deep-learning neural networks has attracted continuing studies for the theoretical principles behind its success. A recent work uncovered a generic inverse variance-flatness (IVF) relation between the variance of neural weights and the landscape flatness of loss function near solutions under SGD [Feng Tu, PNAS 118,0027 (2021)]. To investigate this seemly violation of statistical principle, we deploy a stochastic decomposition to analyze the dynamical properties of SGD. The method constructs the true "energy" function which can be used by Boltzmann distribution. The new energy differs from the usual cost function and explains the IVF relation under SGD. We further verify the scaling relation identified in Feng's work. Our approach may bridge the gap between the classical statistical mechanics and the emerging discipline of artificial intelligence, with potential for better algorithm to the latter.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/27/2020

The Impact of the Mini-batch Size on the Variance of Gradients in Stochastic Gradient Descent

The mini-batch stochastic gradient descent (SGD) algorithm is widely use...
research
08/13/2023

Law of Balance and Stationary Distribution of Stochastic Gradient Descent

The stochastic gradient descent (SGD) algorithm is the algorithm we use ...
research
10/27/2019

A geometric interpretation of stochastic gradient descent using diffusion metrics

Stochastic gradient descent (SGD) is a key ingredient in the training of...
research
11/03/2020

Geometry Perspective Of Estimating Learning Capability Of Neural Networks

The paper uses statistical and differential geometric motivation to acqu...
research
09/09/2018

Stochastic Gradient Descent Learns State Equations with Nonlinear Activations

We study discrete time dynamical systems governed by the state equation ...
research
02/06/2023

Stochastic Gradient Descent-induced drift of representation in a two-layer neural network

Representational drift refers to over-time changes in neural activation ...
research
01/25/2022

On Uniform Boundedness Properties of SGD and its Momentum Variants

A theoretical, and potentially also practical, problem with stochastic g...

Please sign up or login with your details

Forgot password? Click here to reset