Neural networks trained with SGD learn distributions of increasing complexity

11/21/2022
by   Maria Refinetti, et al.
11

The ability of deep neural networks to generalise well even when they interpolate their training data has been explained using various "simplicity biases". These theories postulate that neural networks avoid overfitting by first learning simple functions, say a linear classifier, before learning more complex, non-linear functions. Meanwhile, data structure is also recognised as a key ingredient for good generalisation, yet its role in simplicity biases is not yet understood. Here, we show that neural networks trained using stochastic gradient descent initially classify their inputs using lower-order input statistics, like mean and covariance, and exploit higher-order statistics only later during training. We first demonstrate this distributional simplicity bias (DSB) in a solvable model of a neural network trained on synthetic data. We empirically demonstrate DSB in a range of deep convolutional networks and visual transformers trained on CIFAR10, and show that it even holds in networks pre-trained on ImageNet. We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of Gaussian universality in learning.

READ FULL TEXT

page 2

page 22

research
06/25/2020

The Gaussian equivalence of generative models for learning with two-layer neural networks

Understanding the impact of data structure on learning in neural network...
research
08/04/2023

A stochastic optimization approach to train non-linear neural networks with a higher-order variation regularization

While highly expressive parametric models including deep neural networks...
research
05/30/2023

Identifying Spurious Biases Early in Training through the Lens of Simplicity Bias

Neural networks trained with (stochastic) gradient descent have an induc...
research
06/01/2021

The Gaussian equivalence of generative models for learning with shallow neural networks

Understanding the impact of data structure on the computational tractabi...
research
10/04/2022

Learning an Invertible Output Mapping Can Mitigate Simplicity Bias in Neural Networks

Deep Neural Networks are known to be brittle to even minor distribution ...
research
08/17/2023

Environment Diversification with Multi-head Neural Network for Invariant Learning

Neural networks are often trained with empirical risk minimization; howe...
research
10/28/2016

Globally Optimal Training of Generalized Polynomial Neural Networks with Nonlinear Spectral Methods

The optimization problem behind neural networks is highly non-convex. Tr...

Please sign up or login with your details

Forgot password? Click here to reset