Piecewise linear activations substantially shape the loss surfaces of neural networks

03/27/2020
by   Fengxiang He, et al.
0

Understanding the loss surface of a neural network is fundamentally important to the understanding of deep learning. This paper presents how piecewise linear activation functions substantially shape the loss surfaces of neural networks. We first prove that the loss surfaces of many neural networks have infinite spurious local minima which are defined as the local minima with higher empirical risks than the global minima. Our result demonstrates that the networks with piecewise linear activations possess substantial differences to the well-studied linear neural networks. This result holds for any neural network with arbitrary depth and arbitrary piecewise linear activation functions (excluding linear functions) under most loss functions in practice. Essentially, the underlying assumptions are consistent with most practical circumstances where the output layer is narrower than any hidden layer. In addition, the loss surface of a neural network with piecewise linear activations is partitioned into multiple smooth and multilinear cells by nondifferentiable boundaries. The constructed spurious local minima are concentrated in one cell as a valley: they are connected with each other by a continuous path, on which empirical risk is invariant. Further for one-hidden-layer networks, we prove that all local minima in a cell constitute an equivalence class; they are concentrated in a valley; and they are all global minima in the cell.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2018

A Critical View of Global Optimality in Deep Learning

We investigate the loss surface of deep linear and nonlinear neural netw...
research
02/25/2021

Spurious Local Minima Are Common for Deep Neural Networks with Piecewise Linear Activations

In this paper, it is shown theoretically that spurious local minima are ...
research
02/23/2022

On the Omnipresence of Spurious Local Minima in Certain Neural Network Training Problems

We study the loss landscape of training problems for deep artificial neu...
research
10/20/2022

Global Convergence of SGD On Two Layer Neural Nets

In this note we demonstrate provable convergence of SGD to the global mi...
research
01/22/2019

On Connected Sublevel Sets in Deep Learning

We study sublevel sets of the loss function in training deep neural netw...
research
02/18/2018

Neural Networks with Finite Intrinsic Dimension have no Spurious Valleys

Neural networks provide a rich class of high-dimensional, non-convex opt...
research
09/17/2023

Global Convergence of SGD For Logistic Loss on Two Layer Neural Nets

In this note, we demonstrate a first-of-its-kind provable convergence of...

Please sign up or login with your details

Forgot password? Click here to reset