
Understanding Generalization and Optimization Performance of Deep CNNs
This work aims to provide understandings on the remarkable success of de...
read it

Theory II: Landscape of the Empirical Risk in Deep Learning
Previous theoretical work on deep learning and neural network optimizati...
read it

Stationary Points of Shallow Neural Networks with Quadratic Activation Function
We consider the problem of learning shallow neural networks with quadrat...
read it

Overparametrized deep neural networks do not generalize well
Recently it was shown in several papers that backpropagation is able to ...
read it

Principled Deep Neural Network Training through Linear Programming
Deep Learning has received significant attention due to its impressive p...
read it

Minimizers of the Empirical Risk and Risk Monotonicity
Plotting a learner's average performance against the number of training ...
read it

InformationTheoretic Understanding of Population Risk Improvement with Model Compression
We show that model compression can improve the population risk of a pre...
read it
The Landscape of Deep Learning Algorithms
This paper studies the landscape of empirical risk of deep neural networks by theoretically analyzing its convergence behavior to the population risk as well as its stationary points and properties. For an llayer linear neural network, we prove its empirical risk uniformly converges to its population risk at the rate of O(r^2l√(d(l))/√(n)) with training sample size of n, the total weight dimension of d and the magnitude bound r of weight of each layer. We then derive the stability and generalization bounds for the empirical risk based on this result. Besides, we establish the uniform convergence of gradient of the empirical risk to its population counterpart. We prove the onetoone correspondence of the nondegenerate stationary points between the empirical and population risks with convergence guarantees, which describes the landscape of deep neural networks. In addition, we analyze these properties for deep nonlinear neural networks with sigmoid activation functions. We prove similar results for convergence behavior of their empirical risks as well as the gradients and analyze properties of their nondegenerate stationary points. To our best knowledge, this work is the first one theoretically characterizing landscapes of deep learning algorithms. Besides, our results provide the sample complexity of training a good deep neural network. We also provide theoretical understanding on how the neural network depth l, the layer width, the network size d and parameter magnitude determine the neural network landscapes.
READ FULL TEXT
Comments
There are no comments yet.