Are Saddles Good Enough for Deep Learning?

06/07/2017
by   Adepu Ravi Sankar, et al.
0

Recent years have seen a growing interest in understanding deep neural networks from an optimization perspective. It is understood now that converging to low-cost local minima is sufficient for such models to become effective in practice. However, in this work, we propose a new hypothesis based on recent theoretical findings and empirical studies that deep neural network models actually converge to saddle points with high degeneracy. Our findings from this work are new, and can have a significant impact on the development of gradient descent based methods for training deep networks. We validated our hypotheses using an extensive experimental evaluation on standard datasets such as MNIST and CIFAR-10, and also showed that recent efforts that attempt to escape saddles finally converge to saddles with high degeneracy, which we define as `good saddles'. We also verified the famous Wigner's Semicircle Law in our experimental results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/06/2020

Frank-Wolfe optimization for deep networks

Deep neural networks is today one of the most popular choices in classif...
research
10/10/2022

A global analysis of global optimisation

Theoretical understanding of the training of deep neural networks has ma...
research
05/02/2016

Simple2Complex: Global Optimization by Gradient Descent

A method named simple2complex for modeling and training deep neural netw...
research
07/21/2018

On the Analysis of Trajectories of Gradient Descent in the Optimization of Deep Neural Networks

Theoretical analysis of the error landscape of deep neural networks has ...
research
10/02/2020

Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks

Natural Gradient Descent (NGD) helps to accelerate the convergence of gr...
research
12/20/2013

Unsupervised Pretraining Encourages Moderate-Sparseness

It is well known that direct training of deep neural networks will gener...
research
12/30/2020

Perspective: A Phase Diagram for Deep Learning unifying Jamming, Feature Learning and Lazy Training

Deep learning algorithms are responsible for a technological revolution ...

Please sign up or login with your details

Forgot password? Click here to reset