On the Stochastic Gradient Descent and Inverse Variance-flatness Relation in Artificial Neural Networks

07/11/2022
by   Xia Xiong, et al.
0

Stochastic gradient descent (SGD), a widely used algorithm in deep-learning neural networks has attracted continuing studies for the theoretical principles behind its success. A recent work uncovered a generic inverse variance-flatness (IVF) relation between the variance of neural weights and the landscape flatness of loss function near solutions under SGD [Feng Tu, PNAS 118,0027 (2021)]. To investigate this seemly violation of statistical principle, we deploy a stochastic decomposition to analyze the dynamical properties of SGD. The method constructs the true "energy" function which can be used by Boltzmann distribution. The new energy differs from the usual cost function and explains the IVF relation under SGD. We further verify the scaling relation identified in Feng's work. Our approach may bridge the gap between the classical statistical mechanics and the emerging discipline of artificial intelligence, with potential for better algorithm to the latter.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro