A Critical View of Global Optimality in Deep Learning

02/10/2018
by   Chulhee Yun, et al.
0

We investigate the loss surface of deep linear and nonlinear neural networks. We show that for deep linear networks with differentiable losses, critical points after the multilinear parameterization inherit the structure of critical points of the underlying loss with linear parameterization. As corollaries we obtain "local minima are global" results that subsume most previous results, while showing how to distinguish global minima from saddle points. For nonlinear neural networks, we prove two theorems showing that even for networks with one hidden layer, there can be spurious local minima. Indeed, for piecewise linear nonnegative homogeneous activations (e.g., ReLU), we prove that for almost all practical datasets there exist infinitely many local minima that are not global. We conclude by constructing a counterexample involving other activation functions (e.g., sigmoid, tanh, arctan, etc.), for which there exists a local minimum strictly inferior to the global minimum.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/27/2020

Piecewise linear activations substantially shape the loss surfaces of neural networks

Understanding the loss surface of a neural network is fundamentally impo...
research
11/20/2018

Effect of Depth and Width on Local Minima in Deep Learning

In this paper, we analyze the effects of depth and width on the quality ...
research
12/31/2019

No Spurious Local Minima in Deep Quadratic Networks

Despite their practical success, a theoretical understanding of the loss...
research
11/04/2019

Sub-Optimal Local Minima Exist for Almost All Over-parameterized Neural Networks

Does over-parameterization eliminate sub-optimal local minima for neural...
research
07/09/2019

Are deep ResNets provably better than linear predictors?

Recently, a residual network (ResNet) with a single residual block has b...
research
11/30/2018

Measure, Manifold, Learning, and Optimization: A Theory Of Neural Networks

We present a formal measure-theoretical theory of neural networks (NN) b...
research
07/28/2021

Global minimizers, strict and non-strict saddle points, and implicit regularization for deep linear neural networks

In non-convex settings, it is established that the behavior of gradient-...

Please sign up or login with your details

Forgot password? Click here to reset