Deep Learning without Poor Local Minima

05/23/2016
by   Kenji Kawaguchi, et al.
0

In this paper, we prove a conjecture published in 1989 and also partially address an open problem announced at the Conference on Learning Theory (COLT) 2015. With no unrealistic assumption, we first prove the following statements for the squared loss function of deep linear neural networks with any depth and any widths: 1) the function is non-convex and non-concave, 2) every local minimum is a global minimum, 3) every critical point that is not a global minimum is a saddle point, and 4) there exist "bad" saddle points (where the Hessian has no negative eigenvalue) for the deeper networks (with more than three layers), whereas there is no bad saddle point for the shallow networks (with three layers). Moreover, for deep nonlinear neural networks, we prove the same four statements via a reduction to a deep linear model under the independence assumption adopted from recent work. As a result, we present an instance, for which we can answer the following question: how difficult is it to directly train a deep model in theory? It is more difficult than the classical machine learning models (because of the non-convexity), but not too difficult (because of the nonexistence of poor local minima). Furthermore, the mathematically proven existence of bad saddle points for deeper models would suggest a possible open problem. We note that even though we have advanced the theoretical foundations of deep learning and non-convex optimization, there is still a gap between theory and practice.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2018

Depth with Nonlinearity Creates No Bad Local Minima in ResNets

In this paper, we prove that depth with nonlinearity creates no bad loca...
research
06/13/2018

Weight Initialization without Local Minima in Deep Nonlinear Neural Networks

In this paper, we propose a new weight initialization method called even...
research
10/30/2018

Piecewise Strong Convexity of Neural Networks

We study the loss surface of a fully connected neural network with ReLU ...
research
04/07/2019

Every Local Minimum is a Global Minimum of an Induced Model

For non-convex optimization in machine learning, this paper proves that ...
research
11/30/2014

The Loss Surfaces of Multilayer Networks

We study the connection between the highly non-convex loss function of a...
research
11/04/2016

Topology and Geometry of Half-Rectified Network Optimization

The loss surface of deep neural networks has recently attracted interest...
research
07/13/2023

Layerwise Linear Mode Connectivity

In the federated setup one performs an aggregation of separate local mod...

Please sign up or login with your details

Forgot password? Click here to reset