
The Landscape of Deep Learning Algorithms
This paper studies the landscape of empirical risk of deep neural networ...
read it

Learning with Gradient Descent and Weakly Convex Losses
We study the learning performance of gradient descent when the empirical...
read it

A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics
We study the Stochastic Gradient Langevin Dynamics (SGLD) algorithm for ...
read it

Minimizing Nonconvex Population Risk from Rough Empirical Risk
Population riskthe expectation of the loss over the sampling mechanis...
read it

Disentangling trainability and generalization in deep learning
A fundamental goal in deep learning is the characterization of trainabil...
read it

Theory of Deep Convolutional Neural Networks III: Approximating Radial Functions
We consider a family of deep neural networks consisting of two groups of...
read it

Local Optimality and Generalization Guarantees for the Langevin Algorithm via Empirical Metastability
We study the detailed pathwise behavior of the discretetime Langevin a...
read it
Understanding Generalization and Optimization Performance of Deep CNNs
This work aims to provide understandings on the remarkable success of deep convolutional neural networks (CNNs) by theoretically analyzing their generalization performance and establishing optimization guarantees for gradient descent based training algorithms. Specifically, for a CNN model consisting of l convolutional layers and one fully connected layer, we prove that its generalization error is bounded by O(√(ϱ/n)) where θ denotes freedom degree of the network parameters and ϱ=O((∏_i=1^li (ii+1)/p)+()) encapsulates architecture parameters including the kernel size i, stride i, pooling size p and parameter magnitude i. To our best knowledge, this is the first generalization bound that only depends on O((∏_i=1^l+1i)), tighter than existing ones that all involve an exponential term like O(∏_i=1^l+1i). Besides, we prove that for an arbitrary gradient descent algorithm, the computed approximate stationary point by minimizing empirical risk is also an approximate stationary point to the population risk. This well explains why gradient descent training algorithms usually perform sufficiently well in practice. Furthermore, we prove the onetoone correspondence and convergence guarantees for the nondegenerate stationary points between the empirical and population risks. It implies that the computed local minimum for the empirical risk is also close to a local minimum for the population risk, thus ensuring the good generalization performance of CNNs.
READ FULL TEXT
Comments
There are no comments yet.