Rapid Feature Evolution Accelerates Learning in Neural Networks

05/29/2021
by   Haozhe Shan, et al.
0

Neural network (NN) training and generalization in the infinite-width limit are well-characterized by kernel methods with a neural tangent kernel (NTK) that is stationary in time. However, finite-width NNs consistently outperform corresponding kernel methods, suggesting the importance of feature learning, which manifests as the time evolution of NTKs. Here, we analyze the phenomenon of kernel alignment of the NTK with the target functions during gradient descent. We first provide a mechanistic explanation for why alignment between task and kernel occurs in deep linear networks. We then show that this behavior occurs more generally if one optimizes the feature map over time to accelerate learning while constraining how quickly the features evolve. Empirically, gradient descent undergoes a feature learning phase, during which top eigenfunctions of the NTK quickly align with the target function and the loss decreases faster than power law in time; it then enters a kernel gradient descent (KGD) phase where the alignment does not improve significantly and the training loss decreases in power law. We show that feature evolution is faster and more dramatic in deeper networks. We also found that networks with multiple output nodes develop separate, specialized kernels for each output channel, a phenomenon we termed kernel specialization. We show that this class-specific alignment is does not occur in linear networks.

READ FULL TEXT

page 2

page 3

page 4

page 6

page 8

page 10

page 12

page 14

research
11/30/2020

Feature Learning in Infinite-Width Neural Networks

As its width tends to infinity, a deep neural network's behavior under g...
research
10/29/2021

Neural Networks as Kernel Learners: The Silent Alignment Effect

Neural networks in the lazy training regime converge to kernel machines....
research
10/19/2019

Neural Spectrum Alignment

Expressiveness of deep models was recently addressed via the connection ...
research
10/05/2022

The Influence of Learning Rule on Representation Dynamics in Wide Neural Networks

It is unclear how changing the learning rule of a deep neural network al...
research
05/19/2022

Self-Consistent Dynamical Field Theory of Kernel Evolution in Wide Neural Networks

We analyze feature learning in infinite width neural networks trained wi...
research
08/29/2023

An Adaptive Tangent Feature Perspective of Neural Networks

In order to better understand feature learning in neural networks, we pr...
research
11/22/2022

Learning Deep Neural Networks by Iterative Linearisation

The excellent real-world performance of deep neural networks has receive...

Please sign up or login with your details

Forgot password? Click here to reset