Learning Deep Neural Networks by Iterative Linearisation

11/22/2022
by   Adrian Goldwaser, et al.
0

The excellent real-world performance of deep neural networks has received increasing attention. Despite the capacity to overfit significantly, such large models work better than smaller ones. This phenomenon is often referred to as the scaling law by practitioners. It is of fundamental interest to study why the scaling law exists and how it avoids/controls overfitting. One approach has been looking at infinite width limits of neural networks (e.g., Neural Tangent Kernels, Gaussian Processes); however, in practise, these do not fully explain finite networks as their infinite counterparts do not learn features. Furthermore, the empirical kernel for finite networks (i.e., the inner product of feature vectors), changes significantly during training in contrast to infinite width networks. In this work we derive an iterative linearised training method. We justify iterative lineralisation as an interpolation between finite analogs of the infinite width regime, which do not learn features, and standard gradient descent training which does. We show some preliminary results where iterative linearised training works well, noting in particular how much feature learning is required to achieve comparable performance. We also provide novel insights into the training behaviour of neural networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/04/2023

Les Houches Lectures on Deep Learning at Large Infinite Width

These lectures, presented at the 2022 Les Houches Summer School on Stati...
research
08/20/2015

Steps Toward Deep Kernel Methods from Infinite Neural Networks

Contemporary deep neural networks exhibit impressive results on practica...
research
06/20/2022

Limitations of the NTK for Understanding Generalization in Deep Learning

The “Neural Tangent Kernel” (NTK) (Jacot et al 2018), and its empirical ...
research
10/10/2022

Meta-Principled Family of Hyperparameter Scaling Strategies

In this note, we first derive a one-parameter family of hyperparameter s...
research
05/29/2021

Rapid Feature Evolution Accelerates Learning in Neural Networks

Neural network (NN) training and generalization in the infinite-width li...
research
08/19/2020

Asymptotics of Wide Convolutional Neural Networks

Wide neural networks have proven to be a rich class of architectures for...
research
06/08/2021

A self consistent theory of Gaussian Processes captures feature learning effects in finite CNNs

Deep neural networks (DNNs) in the infinite width/channel limit have rec...

Please sign up or login with your details

Forgot password? Click here to reset