Differential Equations for Modeling Asynchronous Algorithms

05/08/2018
by   Li He, et al.
0

Asynchronous stochastic gradient descent (ASGD) is a popular parallel optimization algorithm in machine learning. Most theoretical analysis on ASGD take a discrete view and prove upper bounds for their convergence rates. However, the discrete view has its intrinsic limitations: there is no characterization of the optimization path and the proof techniques are induction-based and thus usually complicated. Inspired by the recent successful adoptions of stochastic differential equations (SDE) to the theoretical analysis of SGD, in this paper, we study the continuous approximation of ASGD by using stochastic differential delay equations (SDDE). We introduce the approximation method and study the approximation error. Then we conduct theoretical analysis on the convergence rates of ASGD algorithm based on the continuous approximation. There are two methods: moment estimation and energy function minimization can be used to analyze the convergence rates. Moment estimation depends on the specific form of the loss function, while energy function minimization only leverages the convex property of the loss function, and does not depend on its specific form. In addition to the convergence analysis, the continuous view also helps us derive better convergence rates. All of this clearly shows the advantage of taking the continuous view in gradient descent algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/21/2020

Convergence rates of the Semi-Discrete method for stochastic differential equations

We study the convergence rates of the semi-discrete (SD) method original...
research
10/17/2022

Parametric estimation of stochastic differential equations via online gradient descent

We propose an online parametric estimation method of stochastic differen...
research
11/25/2021

Randomized Stochastic Gradient Descent Ascent

An increasing number of machine learning problems, such as robust or adv...
research
06/10/2021

A Continuized View on Nesterov Acceleration for Stochastic Gradient Descent and Randomized Gossip

We introduce the continuized Nesterov acceleration, a close variant of N...
research
02/10/2020

Super-efficiency of automatic differentiation for functions defined as a minimum

In min-min optimization or max-min optimization, one has to compute the ...
research
06/15/2020

Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

In the context of statistical supervised learning, the noiseless linear ...

Please sign up or login with your details

Forgot password? Click here to reset