SGD and Hogwild! Convergence Without the Bounded Gradients Assumption

02/11/2018
by   Lam M. Nguyen, et al.
0

Stochastic gradient descent (SGD) is the optimization algorithm of choice in many machine learning applications such as regularized empirical risk minimization and training deep neural networks. The classical analysis of convergence of SGD is carried out under the assumption that the norm of the stochastic gradient is uniformly bounded. While this might hold for some loss functions, it is always violated for cases where the objective function is strongly convex. In (Bottou et al.,2016) a new analysis of convergence of SGD is performed under the assumption that stochastic gradients are bounded with respect to the true gradient norm. Here we show that for stochastic problems arising in machine learning such bound always holds. Moreover, we propose an alternative convergence analysis of SGD with diminishing learning rate regime, which is results in more relaxed conditions that those in (Bottou et al.,2016). We then move on the asynchronous parallel setting, and prove convergence of the Hogwild! algorithm in the same regime, obtaining the first convergence results for this method in the case of diminished learning rate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/09/2019

Unified Optimal Analysis of the (Stochastic) Gradient Method

In this note we give a simple proof for the convergence of stochastic gr...
research
05/02/2023

Revisiting Gradient Clipping: Stochastic bias and tight convergence guarantees

Gradient clipping is a popular modification to standard (stochastic) gra...
research
09/15/2022

Random initialisations performing above chance and how to find them

Neural networks trained with stochastic gradient descent (SGD) starting ...
research
02/06/2023

U-Clip: On-Average Unbiased Stochastic Gradient Clipping

U-Clip is a simple amendment to gradient clipping that can be applied to...
research
10/16/2015

SGD with Variance Reduction beyond Empirical Risk Minimization

We introduce a doubly stochastic proximal gradient algorithm for optimiz...
research
06/05/2023

Aiming towards the minimizers: fast convergence of SGD for overparametrized problems

Modern machine learning paradigms, such as deep learning, occur in or cl...
research
04/11/2021

Learning from Censored and Dependent Data: The case of Linear Dynamics

Observations from dynamical systems often exhibit irregularities, such a...

Please sign up or login with your details

Forgot password? Click here to reset