Global Minima of DNNs: The Plenty Pantry

05/25/2019
by   Nicole Mücke, et al.
0

A common strategy to train deep neural networks (DNNs) is to use very large architectures and to train them until they (almost) achieve zero training error. Empirically observed good generalization performance on test data, even in the presence of lots of label noise, corroborate such a procedure. On the other hand, in statistical learning theory it is known that over-fitting models may lead to poor generalization properties, occurring in e.g. empirical risk minimization (ERM) over too large hypotheses classes. Inspired by this contradictory behavior, so-called interpolation methods have recently received much attention, leading to consistent and optimally learning methods for some local averaging schemes with zero training error. However, there is no theoretical analysis of interpolating ERM-like methods so far. We take a step in this direction by showing that for certain, large hypotheses classes, some interpolating ERMs enjoy very good statistical guarantees while others fail in the worst sense. Moreover, we show that the same phenomenon occurs for DNNs with zero training error and sufficiently large architectures.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/04/2019

A Generalization Theory of Gradient Descent for Learning Over-parameterized Deep ReLU Networks

Empirical studies show that gradient based methods can learn deep neural...
research
11/19/2019

Information-Theoretic Local Minima Characterization and Regularization

Recent advances in deep learning theory have evoked the study of general...
research
03/21/2019

Harmless interpolation of noisy data in regression

A continuing mystery in understanding the empirical success of deep neur...
research
02/17/2018

An analysis of training and generalization errors in shallow and deep networks

An open problem around deep networks is the apparent absence of over-fit...
research
05/05/2021

A Theoretical-Empirical Approach to Estimating Sample Complexity of DNNs

This paper focuses on understanding how the generalization error scales ...
research
06/13/2018

Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate

Many modern machine learning models are trained to achieve zero or near-...
research
03/10/2021

Why Flatness Correlates With Generalization For Deep Neural Networks

The intuition that local flatness of the loss landscape is correlated wi...

Please sign up or login with your details

Forgot password? Click here to reset