On the linearity of large non-linear models: when and why the tangent kernel is constant

10/02/2020
by   Chaoyue Liu, et al.
18

The goal of this work is to shed light on the remarkable phenomenon of transition to linearity of certain neural networks as their width approaches infinity. We show that the transition to linearity of the model and, equivalently, constancy of the (neural) tangent kernel (NTK) result from the scaling properties of the norm of the Hessian matrix of the network as a function of the network width. We present a general framework for understanding the constancy of the tangent kernel via Hessian scaling applicable to the standard classes of neural networks. Our analysis provides a new perspective on the phenomenon of constant tangent kernel, which is different from the widely accepted "lazy training". Furthermore, we show that the transition to linearity is not a general property of wide neural networks and does not hold when the last layer of the network is non-linear. It is also not necessary for successful optimization by gradient descent.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/10/2022

Transition to Linearity of Wide Neural Networks is an Emerging Property of Assembling Weak Models

Wide neural networks with linear output layer have been shown to be near...
research
03/14/2022

Phenomenology of Double Descent in Finite-Width Neural Networks

`Double descent' delineates the generalization behaviour of models depen...
research
05/24/2022

Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture

In this paper we show that feedforward neural networks corresponding to ...
research
02/20/2020

Kernel and Rich Regimes in Overparametrized Models

A recent line of work studies overparametrized neural networks in the "k...
research
06/13/2019

Kernel and Deep Regimes in Overparametrized Models

A recent line of work studies overparametrized neural networks in the "k...
research
06/30/2022

A note on Linear Bottleneck networks and their Transition to Multilinearity

Randomly initialized wide neural networks transition to linear functions...
research
06/20/2022

Limitations of the NTK for Understanding Generalization in Deep Learning

The “Neural Tangent Kernel” (NTK) (Jacot et al 2018), and its empirical ...

Please sign up or login with your details

Forgot password? Click here to reset