A geometric interpretation of stochastic gradient descent using diffusion metrics

by   R. Fioresi, et al.

Stochastic gradient descent (SGD) is a key ingredient in the training of deep neural networks and yet its geometrical significance appears elusive. We study a deterministic model in which the trajectories of our dynamical systems are described via geodesics of a family of metrics arising from the diffusion matrix. These metrics encode information about the highly non-isotropic gradient noise in SGD. We establish a parallel with General Relativity models, where the role of the electromagnetic field is played by the gradient of the loss function. We compute an example of a two layer network.


page 1

page 2

page 3

page 4


The Regularization Effects of Anisotropic Noise in Stochastic Gradient Descent

Understanding the generalization of deep learning has raised lots of con...

Law of Balance and Stationary Distribution of Stochastic Gradient Descent

The stochastic gradient descent (SGD) algorithm is the algorithm we use ...

On uniform-in-time diffusion approximation for stochastic gradient descent

The diffusion approximation of stochastic gradient descent (SGD) in curr...

Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks

Stochastic gradient descent (SGD) is widely believed to perform implicit...

Deep Gradient Boosting

Stochastic gradient descent (SGD) has been the dominant optimization met...

On the Stochastic Gradient Descent and Inverse Variance-flatness Relation in Artificial Neural Networks

Stochastic gradient descent (SGD), a widely used algorithm in deep-learn...

Online Stochastic Gradient Descent Learns Linear Dynamical Systems from A Single Trajectory

This work investigates the problem of estimating the weight matrices of ...

Please sign up or login with your details

Forgot password? Click here to reset