The layer-wise L1 Loss Landscape of Neural Nets is more complex around local minima

05/06/2021
by   Peter Hinz, et al.
0

For fixed training data and network parameters in the other layers the L1 loss of a ReLU neural network as a function of the first layer's parameters is a piece-wise affine function. We use the Deep ReLU Simplex algorithm to iteratively minimize the loss monotonically on adjacent vertices and analyze the trajectory of these vertex positions. We empirically observe that in a neighbourhood around a local minimum, the iterations behave differently such that conclusions on loss level and proximity of the local minimum can be made before it has been found: Firstly the loss seems to decay exponentially slow at iterated adjacent vertices such that the loss level at the local minimum can be estimated from the loss levels of subsequently iterated vertices, and secondly we observe a strong increase of the vertex density around local minima. This could have far-reaching consequences for the design of new gradient-descent algorithms that might improve convergence rate by exploiting these facts.

READ FULL TEXT
research
03/19/2021

Landscape analysis for shallow ReLU neural networks: complete classification of critical points for affine target functions

In this paper, we analyze the landscape of the true loss of a ReLU neura...
research
04/06/2023

Training a Two Layer ReLU Network Analytically

Neural networks are usually trained with different variants of gradient ...
research
12/03/2017

Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima

We consider the problem of learning a one-hidden-layer neural network wi...
research
06/10/2020

Is the Skip Connection Provable to Reform the Neural Network Loss Landscape?

The residual network is now one of the most effective structures in deep...
research
02/01/2019

Passed & Spurious: analysing descent algorithms and local minima in spiked matrix-tensor model

In this work we analyse quantitatively the interplay between the loss la...
research
10/01/2022

A Combinatorial Perspective on the Optimization of Shallow ReLU Networks

The NP-hard problem of optimizing a shallow ReLU network can be characte...
research
02/16/2021

Message Passing Descent for Efficient Machine Learning

We propose a new iterative optimization method for the Data-Fitting (DF)...

Please sign up or login with your details

Forgot password? Click here to reset