Stochastic gradient descent with noise of machine learning type. Part II: Continuous time analysis

06/04/2021
by   Stephan Wojtowytsch, et al.
0

The representation of functions by artificial neural networks depends on a large number of parameters in a non-linear fashion. Suitable parameters of these are found by minimizing a 'loss functional', typically by stochastic gradient descent (SGD) or an advanced SGD-based algorithm. In a continuous time model for SGD with noise that follows the 'machine learning scaling', we show that in a certain noise regime, the optimization algorithm prefers 'flat' minima of the objective function in a sense which is different from the flat minimum selection of continuous time SGD with homogeneous noise.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/04/2021

Stochastic gradient descent with noise of machine learning type. Part I: Discrete time analysis

Stochastic gradient descent (SGD) is one of the most popular algorithms ...
research
11/17/2016

Stochastic Gradient Descent in Continuous Time

Stochastic gradient descent in continuous time (SGDCT) provides a comput...
research
01/07/2018

Theory of Deep Learning IIb: Optimization Properties of SGD

In Theory IIb we characterize with a mix of theory and experiments the o...
research
06/16/2020

Directional Pruning of Deep Neural Networks

In the light of the fact that the stochastic gradient descent (SGD) ofte...
research
09/19/2022

On the Theoretical Properties of Noise Correlation in Stochastic Optimization

Studying the properties of stochastic noise to optimize complex non-conv...
research
03/22/2023

𝒞^k-continuous Spline Approximation with TensorFlow Gradient Descent Optimizers

In this work we present an "out-of-the-box" application of Machine Learn...
research
12/07/2021

A Continuous-time Stochastic Gradient Descent Method for Continuous Data

Optimization problems with continuous data appear in, e.g., robust machi...

Please sign up or login with your details

Forgot password? Click here to reset