On the Convergence of SGD Training of Neural Networks

08/12/2015
by   Thomas M. Breuel, et al.
0

Neural networks are usually trained by some form of stochastic gradient descent (SGD)). A number of strategies are in common use intended to improve SGD optimization, such as learning rate schedules, momentum, and batching. These are motivated by ideas about the occurrence of local minima at different scales, valleys, and other phenomena in the objective function. Empirical results presented here suggest that these phenomena are not significant factors in SGD optimization of MLP-related objective functions, and that the behavior of stochastic gradient descent in these problems is better described as the simultaneous convergence at different rates of many, largely non-interacting subproblems

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/26/2021

The convergence of the Stochastic Gradient Descent (SGD) : a self-contained proof

We give here a proof of the convergence of the Stochastic Gradient Desce...
research
12/13/2020

Optimization and Learning With Nonlocal Calculus

Nonlocal models have recently had a major impact in nonlinear continuum ...
research
07/19/2023

Convergence Guarantees for Stochastic Subgradient Methods in Nonsmooth Nonconvex Optimization

In this paper, we investigate the convergence properties of the stochast...
research
06/19/2018

Faster SGD training by minibatch persistency

It is well known that, for most datasets, the use of large-size minibatc...
research
04/14/2022

RankNEAT: Outperforming Stochastic Gradient Search in Preference Learning Tasks

Stochastic gradient descent (SGD) is a premium optimization method for t...
research
06/25/2021

Proxy Convexity: A Unified Framework for the Analysis of Neural Networks Trained by Gradient Descent

Although the optimization objectives for learning neural networks are hi...
research
01/31/2019

Improving SGD convergence by tracing multiple promising directions and estimating distance to minimum

Deep neural networks are usually trained with stochastic gradient descen...

Please sign up or login with your details

Forgot password? Click here to reset