Deep learning generalizes because the parameter-function map is biased towards simple functions

05/22/2018
by   Guillermo Valle Pérez, et al.
0

Deep neural networks generalize remarkably well without explicit regularization even in the strongly over-parametrized regime. This success suggests that some form of implicit regularization must be at work. By applying a modified version of the coding theorem from algorithmic information theory and by performing extensive empirical analysis of random neural networks, we argue that the parameter function map of deep neural networks is exponentially biased towards functions with lower descriptional complexity. We show explicitly for supervised learning of Boolean functions that the intrinsic simplicity bias of deep neural networks means that they generalize significantly better than an unbiased learning algorithm does. The superior generalization due to simplicity bias can be explained using PAC-Bayes theory, which yields useful generalization error bounds for learning Boolean functions with a wide range of complexities. Finally, we provide evidence that deeper neural networks trained on the CIFAR10 data set exhibit stronger simplicity bias than shallow networks do, which may help explain why deeper networks generalize better than shallow ones do.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/25/2018

Deep neural networks are biased towards simple functions

We prove that the binary classifiers of bit strings generated by random ...
research
09/25/2019

Neural networks are a priori biased towards Boolean functions with low entropy

Understanding the inductive bias of neural networks is critical to expla...
research
11/06/2021

Understanding Layer-wise Contributions in Deep Neural Networks through Spectral Analysis

Spectral analysis is a powerful tool, decomposing any function into simp...
research
01/31/2019

Effect of Various Regularizers on Model Complexities of Neural Networks in Presence of Input Noise

Deep neural networks are over-parameterized, which implies that the numb...
research
11/25/2022

The smooth output assumption, and why deep networks are better than wide ones

When several models have similar training scores, classical model select...
research
09/20/2019

Do Compressed Representations Generalize Better?

One of the most studied problems in machine learning is finding reasonab...
research
02/12/2021

A Too-Good-to-be-True Prior to Reduce Shortcut Reliance

Despite their impressive performance in object recognition and other tas...

Please sign up or login with your details

Forgot password? Click here to reset