Topology and Geometry of Half-Rectified Network Optimization

11/04/2016
by   C. Daniel Freeman, et al.
0

The loss surface of deep neural networks has recently attracted interest in the optimization and machine learning communities as a prime example of high-dimensional non-convex problem. Some insights were recently gained using spin glass models and mean-field approximations, but at the expense of strongly simplifying the nonlinear nature of the model. In this work, we do not make any such assumption and study conditions on the data distribution and model architecture that prevent the existence of bad local minima. Our theoretical work quantifies and formalizes two important folklore facts: (i) the landscape of deep linear networks has a radically different topology from that of deep half-rectified ones, and (ii) that the energy landscape in the non-linear case is fundamentally controlled by the interplay between the smoothness of the data distribution and model over-parametrization. Our main theoretical contribution is to prove that half-rectified single layer networks are asymptotically connected, and we provide explicit bounds that reveal the aforementioned interplay. The conditioning of gradient descent is the next challenge we address. We study this question through the geometry of the level sets, and we introduce an algorithm to efficiently estimate the regularity of such sets on large-scale networks. Our empirical results show that these level sets remain connected throughout all the learning phase, suggesting a near convex behavior, but they become exponentially more curvy as the energy level decays, in accordance to what is observed in practice with very low curvature attractors.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/31/2019

No Spurious Local Minima in Deep Quadratic Networks

Despite their practical success, a theoretical understanding of the loss...
research
08/01/2018

Geometry of energy landscapes and the optimizability of deep neural networks

Deep neural networks are workhorse models in machine learning with multi...
research
01/12/2022

On generalization bounds for deep networks based on loss surface implicit regularization

The classical statistical learning theory says that fitting too many par...
research
05/23/2016

Deep Learning without Poor Local Minima

In this paper, we prove a conjecture published in 1989 and also partiall...
research
04/09/2022

FuNNscope: Visual microscope for interactively exploring the loss landscape of fully connected neural networks

Despite their effective use in various fields, many aspects of neural ne...
research
04/03/2023

Charting the Topography of the Neural Network Landscape with Thermal-Like Noise

The training of neural networks is a complex, high-dimensional, non-conv...
research
02/18/2018

Neural Networks with Finite Intrinsic Dimension have no Spurious Valleys

Neural networks provide a rich class of high-dimensional, non-convex opt...

Please sign up or login with your details

Forgot password? Click here to reset