On the Implicit Biases of Architecture Gradient Descent

10/08/2021
by   Jeremy Bernstein, et al.
4

Do neural networks generalise because of bias in the functions returned by gradient descent, or bias already present in the network architecture? Por qué no los dos? This paper finds that while typical networks that fit the training data already generalise fairly well, gradient descent can further improve generalisation by selecting networks with a large margin. This conclusion is based on a careful study of the behaviour of infinite width networks trained by Bayesian inference and finite width networks trained by gradient descent. To measure the implicit bias of architecture, new technical tools are developed to both analytically bound and consistently estimate the average test error of the neural network–Gaussian process (NNGP) posterior. This error is found to be already better than chance, corroborating the findings of Valle-Pérez et al. (2019) and underscoring the importance of architecture. Going beyond this result, this paper finds that test performance can be substantially improved by selecting a function with much larger margin than is typical under the NNGP posterior. This highlights a curious fact: minimum a posteriori functions can generalise best, and gradient descent can select for those functions. In summary, new technical tools suggest a nuanced portrait of generalisation involving both the implicit biases of architecture and gradient descent. Code for this paper is available at: https://github.com/jxbz/implicit-bias/.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/05/2019

Neural Tangents: Fast and Easy Infinite Neural Networks in Python

Neural Tangents is a library designed to enable research into infinite-w...
research
09/26/2019

Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks

Recent work has revealed that overparameterized networks trained by grad...
research
06/01/2018

Implicit Bias of Gradient Descent on Linear Convolutional Networks

We show that gradient descent on full-width linear convolutional network...
research
07/08/2022

Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent

As part of the effort to understand implicit bias of gradient descent in...
research
11/22/2021

Depth Without the Magic: Inductive Bias of Natural Gradient Descent

In gradient descent, changing how we parametrize the model can lead to d...
research
02/09/2022

On the Implicit Bias of Gradient Descent for Temporal Extrapolation

Common practice when using recurrent neural networks (RNNs) is to apply ...
research
03/01/2021

Computing the Information Content of Trained Neural Networks

How much information does a learning algorithm extract from the training...

Please sign up or login with your details

Forgot password? Click here to reset