Taylorized Training: Towards Better Approximation of Neural Network Training at Finite Width

02/10/2020
by   Yu Bai, et al.
20

We propose Taylorized training as an initiative towards better understanding neural network training at finite width. Taylorized training involves training the k-th order Taylor expansion of the neural network at initialization, and is a principled extension of linearized training—a recently proposed theory for understanding the success of deep learning. We experiment with Taylorized training on modern neural network architectures, and show that Taylorized training (1) agrees with full neural network training increasingly better as we increase k, and (2) can significantly close the performance gap between linearized and full training. Compared with linearized training, higher-order training works in more realistic settings such as standard parameterization and large (initial) learning rate. We complement our experiments with theoretical results showing that the approximation error of k-th order Taylorized models decay exponentially over k in wide neural networks.

READ FULL TEXT
research
02/01/2023

Gradient Descent in Neural Networks as Sequential Learning in RKBS

The study of Neural Tangent Kernels (NTKs) has provided much needed insi...
research
10/30/2016

A Theoretical Study of The Relationship Between Whole An ELM Network and Its Subnetworks

A biological neural network is constituted by numerous subnetworks and m...
research
02/14/2019

Superposition of many models into one

We present a method for storing multiple models within a single set of p...
research
03/27/2020

On the Optimization Dynamics of Wide Hypernetworks

Recent results in the theoretical study of deep learning have shown that...
research
12/09/2017

Peephole: Predicting Network Performance Before Training

The quest for performant networks has been a significant force that driv...
research
09/25/2019

Asymptotics of Wide Networks from Feynman Diagrams

Understanding the asymptotic behavior of wide networks is of considerabl...
research
06/03/2023

Random matrix theory and the loss surfaces of neural networks

Neural network models are one of the most successful approaches to machi...

Please sign up or login with your details

Forgot password? Click here to reset