Analysis of Stochastic Gradient Descent in Continuous Time

04/15/2020
by   Jonas Latz, et al.
0

Stochastic gradient descent is an optimisation method that combines classical gradient descent with random subsampling within the target functional. In this work, we introduce the stochastic gradient process as a continuous-time representation of stochastic gradient descent. The stochastic gradient process is a dynamical system that is coupled with a continuous-time Markov process living on a finite state space. The dynamical system - a gradient flow - represents the gradient descent part, the process on the finite state space represents the random subsampling. Processes of this type are, for instance, used to model clonal populations in fluctuating environments. After introducing it, we study theoretical properties of the stochastic gradient process. We show that it converges weakly to the gradient flow with respect to the full target function, as the learning rate approaches zero. Moreover, we give assumptions under which the stochastic gradient process is exponentially ergodic in the Wasserstein sense. We then additionally assume that the single target functions are strongly convex and the learning rate goes to zero sufficiently slowly. In this case, the process converges with exponential rate to a distribution arbitrarily close to the point mass concentrated in the global minimum of the full target function. We conclude with a discussion of discretisation strategies for the stochastic gradient process and illustrate our concepts in numerical experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/07/2021

A Continuous-time Stochastic Gradient Descent Method for Continuous Data

Optimization problems with continuous data appear in, e.g., robust machi...
research
02/07/2023

Convergence rates for momentum stochastic gradient descent with noise of machine learning type

We consider the momentum stochastic gradient descent scheme (MSGD) and i...
research
03/22/2022

Gradient flows and randomised thresholding: sparse inversion and classification

Sparse inversion and classification problems are ubiquitous in modern da...
research
09/08/2022

Losing momentum in continuous-time stochastic optimisation

The training of deep neural networks and other modern machine learning m...
research
09/02/2017

A convergence analysis of the perturbed compositional gradient flow: averaging principle and normal deviations

We consider in this work a system of two stochastic differential equatio...
research
07/11/2023

Implicit regularisation in stochastic gradient descent: from single-objective to two-player games

Recent years have seen many insights on deep learning optimisation being...
research
12/05/2018

Uncertainty Sampling is Preconditioned Stochastic Gradient Descent on Zero-One Loss

Uncertainty sampling, a popular active learning algorithm, is used to re...

Please sign up or login with your details

Forgot password? Click here to reset