What can linearized neural networks actually say about generalization?

06/12/2021
by   Guillermo Ortiz-Jiménez, et al.
9

For certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization. However, for the networks used in practice, the empirical NTK represents only a rough first-order approximation of these architectures. Still, a growing body of work keeps leveraging this approximation to successfully analyze important deep learning phenomena and derive algorithms for new applications. In our work, we provide strong empirical evidence to determine the practical validity of such approximation by conducting a systematic comparison of the behaviour of different neural networks and their linear approximations on different tasks. We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks, albeit with important nuances. Specifically, we discover that, in contrast to what was previously observed, neural networks do not always perform better than their kernel approximations, and reveal that their performance gap heavily depends on architecture, number of samples and training task. In fact, we show that during training, deep networks increase the alignment of their empirical NTK with the target task, which explains why linear approximations at the end of training can better explain the dynamics of deep networks. Overall, our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research, as well as provides a new perspective on the use of the NTK approximation in deep learning.

READ FULL TEXT

page 3

page 5

page 6

page 7

page 10

page 11

page 15

page 16

research
06/20/2022

Limitations of the NTK for Understanding Generalization in Deep Learning

The “Neural Tangent Kernel” (NTK) (Jacot et al 2018), and its empirical ...
research
04/13/2021

On the validity of kernel approximations for orthogonally-initialized neural networks

In this note we extend kernel function approximation results for neural ...
research
11/13/2021

The Three Stages of Learning Dynamics in High-Dimensional Kernel Methods

To understand how deep learning works, it is crucial to understand the t...
research
09/27/2018

An analytic theory of generalization dynamics and transfer learning in deep linear networks

Much attention has been devoted recently to the generalization puzzle in...
research
11/25/2022

The smooth output assumption, and why deep networks are better than wide ones

When several models have similar training scores, classical model select...
research
01/30/2023

Deep networks for system identification: a Survey

Deep learning is a topic of considerable current interest. The availabil...
research
10/03/2019

Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks

Recent theoretical work has established connections between over-paramet...

Please sign up or login with your details

Forgot password? Click here to reset