DeepAI AI Chat
Log In Sign Up

What can linearized neural networks actually say about generalization?

06/12/2021
by   Guillermo Ortiz-Jiménez, et al.
EPFL
ETH Zurich
9

For certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization. However, for the networks used in practice, the empirical NTK represents only a rough first-order approximation of these architectures. Still, a growing body of work keeps leveraging this approximation to successfully analyze important deep learning phenomena and derive algorithms for new applications. In our work, we provide strong empirical evidence to determine the practical validity of such approximation by conducting a systematic comparison of the behaviour of different neural networks and their linear approximations on different tasks. We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks, albeit with important nuances. Specifically, we discover that, in contrast to what was previously observed, neural networks do not always perform better than their kernel approximations, and reveal that their performance gap heavily depends on architecture, number of samples and training task. In fact, we show that during training, deep networks increase the alignment of their empirical NTK with the target task, which explains why linear approximations at the end of training can better explain the dynamics of deep networks. Overall, our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research, as well as provides a new perspective on the use of the NTK approximation in deep learning.

READ FULL TEXT

page 3

page 5

page 6

page 7

page 10

page 11

page 15

page 16

06/20/2022

Limitations of the NTK for Understanding Generalization in Deep Learning

The “Neural Tangent Kernel” (NTK) (Jacot et al 2018), and its empirical ...
04/13/2021

On the validity of kernel approximations for orthogonally-initialized neural networks

In this note we extend kernel function approximation results for neural ...
11/13/2021

The Three Stages of Learning Dynamics in High-Dimensional Kernel Methods

To understand how deep learning works, it is crucial to understand the t...
09/27/2018

An analytic theory of generalization dynamics and transfer learning in deep linear networks

Much attention has been devoted recently to the generalization puzzle in...
11/25/2022

The smooth output assumption, and why deep networks are better than wide ones

When several models have similar training scores, classical model select...
01/30/2023

Deep networks for system identification: a Survey

Deep learning is a topic of considerable current interest. The availabil...
10/03/2019

Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks

Recent theoretical work has established connections between over-paramet...