The Landscape of Deep Learning Algorithms

05/19/2017
by   Pan Zhou, et al.
0

This paper studies the landscape of empirical risk of deep neural networks by theoretically analyzing its convergence behavior to the population risk as well as its stationary points and properties. For an l-layer linear neural network, we prove its empirical risk uniformly converges to its population risk at the rate of O(r^2l√(d(l))/√(n)) with training sample size of n, the total weight dimension of d and the magnitude bound r of weight of each layer. We then derive the stability and generalization bounds for the empirical risk based on this result. Besides, we establish the uniform convergence of gradient of the empirical risk to its population counterpart. We prove the one-to-one correspondence of the non-degenerate stationary points between the empirical and population risks with convergence guarantees, which describes the landscape of deep neural networks. In addition, we analyze these properties for deep nonlinear neural networks with sigmoid activation functions. We prove similar results for convergence behavior of their empirical risks as well as the gradients and analyze properties of their non-degenerate stationary points. To our best knowledge, this work is the first one theoretically characterizing landscapes of deep learning algorithms. Besides, our results provide the sample complexity of training a good deep neural network. We also provide theoretical understanding on how the neural network depth l, the layer width, the network size d and parameter magnitude determine the neural network landscapes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2018

Understanding Generalization and Optimization Performance of Deep CNNs

This work aims to provide understandings on the remarkable success of de...
research
07/22/2016

The Landscape of Empirical Risk for Non-convex Losses

Most high-dimensional estimation and prediction methods propose to minim...
research
12/04/2019

Landscape Complexity for the Empirical Risk of Generalized Linear Models

We present a method to obtain the average and the typical value of the n...
research
06/22/2021

The Rate of Convergence of Variation-Constrained Deep Neural Networks

Multi-layer feedforward networks have been used to approximate a wide ra...
research
11/26/2020

Spectral Analysis and Stability of Deep Neural Dynamics

Our modern history of deep learning follows the arc of famous emergent d...
research
12/03/2019

Stationary Points of Shallow Neural Networks with Quadratic Activation Function

We consider the problem of learning shallow neural networks with quadrat...
research
06/11/2021

Landscape Correspondence of Empirical and Population Risks in the Eigendecomposition Problem

Spectral methods include a family of algorithms related to the eigenvect...

Please sign up or login with your details

Forgot password? Click here to reset