DeepAI AI Chat
Log In Sign Up

What can linearized neural networks actually say about generalization?

by   Guillermo Ortiz-Jiménez, et al.
ETH Zurich

For certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization. However, for the networks used in practice, the empirical NTK represents only a rough first-order approximation of these architectures. Still, a growing body of work keeps leveraging this approximation to successfully analyze important deep learning phenomena and derive algorithms for new applications. In our work, we provide strong empirical evidence to determine the practical validity of such approximation by conducting a systematic comparison of the behaviour of different neural networks and their linear approximations on different tasks. We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks, albeit with important nuances. Specifically, we discover that, in contrast to what was previously observed, neural networks do not always perform better than their kernel approximations, and reveal that their performance gap heavily depends on architecture, number of samples and training task. In fact, we show that during training, deep networks increase the alignment of their empirical NTK with the target task, which explains why linear approximations at the end of training can better explain the dynamics of deep networks. Overall, our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research, as well as provides a new perspective on the use of the NTK approximation in deep learning.


page 3

page 5

page 6

page 7

page 10

page 11

page 15

page 16


Limitations of the NTK for Understanding Generalization in Deep Learning

The “Neural Tangent Kernel” (NTK) (Jacot et al 2018), and its empirical ...

On the validity of kernel approximations for orthogonally-initialized neural networks

In this note we extend kernel function approximation results for neural ...

The Three Stages of Learning Dynamics in High-Dimensional Kernel Methods

To understand how deep learning works, it is crucial to understand the t...

An analytic theory of generalization dynamics and transfer learning in deep linear networks

Much attention has been devoted recently to the generalization puzzle in...

The smooth output assumption, and why deep networks are better than wide ones

When several models have similar training scores, classical model select...

Deep networks for system identification: a Survey

Deep learning is a topic of considerable current interest. The availabil...

Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks

Recent theoretical work has established connections between over-paramet...