The critical locus of overparameterized neural networks

05/08/2020
by   Y Cooper, et al.
0

Many aspects of the geometry of loss functions in deep learning remain mysterious. In this paper, we work toward a better understanding of the geometry of the loss function L of overparameterized feedforward neural networks. In this setting, we identify several components of the critical locus of L and study their geometric properties. For networks of depth ℓ≥ 4, we identify a locus of critical points we call the star locus S. Within S we identify a positive-dimensional sublocus C with the property that for p ∈ C, p is a degenerate critical point, and no existing theoretical result guarantees that gradient descent will not converge to p. For very wide networks, we build on earlier work and show that all critical points of L are degenerate, and give lower bounds on the number of zero eigenvalues of the Hessian at each critical point. For networks that are both deep and very wide, we compare the growth rates of the zero eigenspaces of the Hessian at all the different families of critical points that we identify. The results in this paper provide a starting point to a more quantitative understanding of the properties of various components of the critical locus of L.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2021

Embedding Principle: a hierarchical structure of loss landscape of deep neural networks

We prove a general Embedding Principle of loss landscape of deep neural ...
research
01/29/2019

Numerically Recovering the Critical Points of a Deep Linear Autoencoder

Numerically locating the critical points of non-convex surfaces is a lon...
research
10/30/2017

Critical Points of Neural Networks: Analytical Forms and Landscape Properties

Due to the success of deep learning to solving a variety of challenging ...
research
03/23/2020

Critical Point-Finding Methods Reveal Gradient-Flat Regions of Deep Network Losses

Despite the fact that the loss functions of deep neural networks are hig...
research
08/28/2022

Visualizing high-dimensional loss landscapes with Hessian directions

Analyzing geometric properties of high-dimensional loss functions, such ...
research
01/14/2020

On the Convex Behavior of Deep Neural Networks in Relation to the Layers' Width

The Hessian of neural networks can be decomposed into a sum of two matri...
research
10/09/2018

Learning One-hidden-layer Neural Networks under General Input Distributions

Significant advances have been made recently on training neural networks...

Please sign up or login with your details

Forgot password? Click here to reset