Geometry of energy landscapes and the optimizability of deep neural networks

08/01/2018
by   Simon Becker, et al.
0

Deep neural networks are workhorse models in machine learning with multiple layers of non-linear functions composed in series. Their loss function is highly non-convex, yet empirically even gradient descent minimisation is sufficient to arrive at accurate and predictive models. It is hitherto unknown why are deep neural networks easily optimizable. We analyze the energy landscape of a spin glass model of deep neural networks using random matrix theory and algebraic geometry. We analytically show that the multilayered structure holds the key to optimizability: Fixing the number of parameters and increasing network depth, the number of stationary points in the loss function decreases, minima become more clustered in parameter space, and the tradeoff between the depth and width of minima becomes less severe. Our analytical results are numerically verified through comparison with neural networks trained on a set of classical benchmark datasets. Our model uncovers generic design principles of machine learning models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/31/2019

Revisiting Landscape Analysis in Deep Neural Networks: Eliminating Decreasing Paths to Infinity

Traditional landscape analysis of deep neural networks aims to show that...
research
04/14/2023

Who breaks early, looses: goal oriented training of deep neural networks based on port Hamiltonian dynamics

The highly structured energy landscape of the loss as a function of para...
research
11/04/2016

Topology and Geometry of Half-Rectified Network Optimization

The loss surface of deep neural networks has recently attracted interest...
research
02/07/2022

Deep Networks on Toroids: Removing Symmetries Reveals the Structure of Flat Regions in the Landscape Geometry

We systematize the approach to the investigation of deep neural network ...
research
07/13/2021

How many degrees of freedom do we need to train deep networks: a loss landscape perspective

A variety of recent works, spanning pruning, lottery tickets, and traini...
research
10/17/2018

The loss surface of deep linear networks viewed through the algebraic geometry lens

By using the viewpoint of modern computational algebraic geometry, we ex...
research
05/22/2023

Deep Neural Collapse Is Provably Optimal for the Deep Unconstrained Features Model

Neural collapse (NC) refers to the surprising structure of the last laye...

Please sign up or login with your details

Forgot password? Click here to reset