SGD May Never Escape Saddle Points

07/25/2021
by   Liu Ziyin, et al.
0

Stochastic gradient descent (SGD) has been deployed to solve highly non-linear and non-convex machine learning problems such as the training of deep neural networks. However, previous works on SGD often rely on highly restrictive and unrealistic assumptions about the nature of noise in SGD. In this work, we mathematically construct examples that defy previous understandings of SGD. For example, our constructions show that: (1) SGD may converge to a local maximum; (2) SGD may escape a saddle point arbitrarily slowly; (3) SGD may prefer sharp minima over the flat ones; and (4) AMSGrad may converge to a local maximum. Our result suggests that the noise structure of SGD might be more important than the loss landscape in neural network training and that future research should focus on deriving the actual noise structure in deep learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/24/2020

Dynamic of Stochastic Gradient Descent with State-Dependent Noise

Stochastic gradient descent (SGD) and its variants are mainstream method...
research
06/02/2022

Stochastic gradient descent introduces an effective landscape-dependent regularization favoring flat solutions

Generalization is one of the most important problems in deep learning (D...
research
04/18/2018

A Mean Field View of the Landscape of Two-Layers Neural Networks

Multi-layer neural networks are among the most powerful models in machin...
research
11/07/2021

Quasi-potential theory for escape problem: Quantitative sharpness effect on SGD's escape from local minima

We develop a quantitative theory on an escape problem of a stochastic gr...
research
11/06/2016

Entropy-SGD: Biasing Gradient Descent Into Wide Valleys

This paper proposes a new optimization algorithm called Entropy-SGD for ...
research
02/28/2017

Learning What Data to Learn

Machine learning is essentially the sciences of playing with data. An ad...
research
06/08/2015

Path-SGD: Path-Normalized Optimization in Deep Neural Networks

We revisit the choice of SGD for training deep neural networks by recons...

Please sign up or login with your details

Forgot password? Click here to reset