Characterization of Gradient Dominance and Regularity Conditions for Neural Networks

10/18/2017
by   Yi Zhou, et al.
0

The past decade has witnessed a successful application of deep learning to solving many challenging problems in machine learning and artificial intelligence. However, the loss functions of deep neural networks (especially nonlinear networks) are still far from being well understood from a theoretical aspect. In this paper, we enrich the current understanding of the landscape of the square loss functions for three types of neural networks. Specifically, when the parameter matrices are square, we provide an explicit characterization of the global minimizers for linear networks, linear residual networks, and nonlinear networks with one hidden layer. Then, we establish two quadratic types of landscape properties for the square loss of these neural networks, i.e., the gradient dominance condition within the neighborhood of their full rank global minimizers, and the regularity condition along certain directions and within the neighborhood of their global minimizers. These two landscape properties are desirable for the optimization around the global minimizers of the loss function for these neural networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/30/2017

Critical Points of Neural Networks: Analytical Forms and Landscape Properties

Due to the success of deep learning to solving a variety of challenging ...
research
07/08/2017

Global optimality conditions for deep neural networks

We study the error landscape of deep linear and nonlinear neural network...
research
02/07/2021

Tilting the playing field: Dynamical loss functions for machine learning

We show that learning can be improved by using loss functions that evolv...
research
06/06/2019

Learning in Gated Neural Networks

Gating is a key feature in modern neural networks including LSTMs, GRUs ...
research
10/09/2018

Learning One-hidden-layer Neural Networks under General Input Distributions

Significant advances have been made recently on training neural networks...
research
10/05/2017

Porcupine Neural Networks: (Almost) All Local Optima are Global

Neural networks have been used prominently in several machine learning a...
research
03/03/2018

On the Power of Over-parametrization in Neural Networks with Quadratic Activation

We provide new theoretical insights on why over-parametrization is effec...

Please sign up or login with your details

Forgot password? Click here to reset