Weight Initialization without Local Minima in Deep Nonlinear Neural Networks

06/13/2018
by   Tohru Nitta, et al.
0

In this paper, we propose a new weight initialization method called even initialization for wide and deep nonlinear neural networks with the ReLU activation function. We prove that no poor local minimum exists in the initial loss landscape in the wide and deep nonlinear neural network initialized by the even initialization method that we propose. Specifically, in the initial loss landscape of such a wide and deep ReLU neural network model, the following four statements hold true: 1) the loss function is non-convex and non-concave; 2) every local minimum is a global minimum; 3) every critical point that is not a global minimum is a saddle point; and 4) bad saddle points exist. We also show that the weight values initialized by the even initialization method are contained in those initialized by both of the (often used) standard initialization and He initialization methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2016

Deep Learning without Poor Local Minima

In this paper, we prove a conjecture published in 1989 and also partiall...
research
06/12/2019

Semi-flat minima and saddle points by embedding neural networks to overparameterization

We theoretically study the landscape of the training error for neural ne...
research
09/16/2020

Landscape of Sparse Linear Network: A Brief Investigation

Network pruning, or sparse network has a long history and practical sign...
research
10/05/2017

Porcupine Neural Networks: (Almost) All Local Optima are Global

Neural networks have been used prominently in several machine learning a...
research
07/06/2018

The Goldilocks zone: Towards better understanding of neural network loss landscapes

We explore the loss landscape of fully-connected neural networks using r...
research
11/19/2016

Local minima in training of neural networks

There has been a lot of recent interest in trying to characterize the er...
research
11/30/2014

The Loss Surfaces of Multilayer Networks

We study the connection between the highly non-convex loss function of a...

Please sign up or login with your details

Forgot password? Click here to reset