Theory II: Landscape of the Empirical Risk in Deep Learning

03/28/2017
by   Qianli Liao, et al.
0

Previous theoretical work on deep learning and neural network optimization tend to focus on avoiding saddle points and local minima. However, the practical observation is that, at least in the case of the most successful Deep Convolutional Neural Networks (DCNNs), practitioners can always increase the network size to fit the training data (an extreme example would be [1]). The most successful DCNNs such as VGG and ResNets are best used with a degree of "overparametrization". In this work, we characterize with a mix of theory and experiments, the landscape of the empirical risk of overparametrized DCNNs. We first prove in the regression framework the existence of a large number of degenerate global minimizers with zero empirical error (modulo inconsistent equations). The argument that relies on the use of Bezout theorem is rigorous when the RELUs are replaced by a polynomial nonlinearity (which empirically works as well). As described in our Theory III [2] paper, the same minimizers are degenerate and thus very likely to be found by SGD that will furthermore select with higher probability the most robust zero-minimizer. We further experimentally explored and visualized the landscape of empirical risk of a DCNN on CIFAR-10 during the entire training process and especially the global minima. Finally, based on our theoretical and experimental results, we propose an intuitive model of the landscape of DCNN's empirical loss surface, which might not be as complicated as people commonly believe.

READ FULL TEXT

page 1

page 10

page 12

page 13

page 18

page 21

page 27

page 30

research
12/31/2019

No Spurious Local Minima in Deep Quadratic Networks

Despite their practical success, a theoretical understanding of the loss...
research
10/30/2017

The loss surface and expressivity of deep convolutional neural networks

We analyze the expressiveness and loss surface of practical deep convolu...
research
06/14/2021

Extracting Global Dynamics of Loss Landscape in Deep Learning Models

Deep learning models evolve through training to learn the manifold in wh...
research
10/03/2018

Theory of Generative Deep Learning : Probe Landscape of Empirical Error via Norm Based Capacity Control

Despite its remarkable empirical success as a highly competitive branch ...
research
10/02/2020

No Spurious Local Minima: on the Optimization Landscapes of Wide and Deep Neural Networks

Empirical studies suggest that wide neural networks are comparably easy ...
research
04/06/2018

The Loss Surface of XOR Artificial Neural Networks

Training an artificial neural network involves an optimization process o...
research
11/23/2020

Neural collapse with unconstrained features

Neural collapse is an emergent phenomenon in deep learning that was rece...

Please sign up or login with your details

Forgot password? Click here to reset