The Influence of Learning Rule on Representation Dynamics in Wide Neural Networks

by   Blake Bordelon, et al.

It is unclear how changing the learning rule of a deep neural network alters its learning dynamics and representations. To gain insight into the relationship between learned features, function approximation, and the learning rule, we analyze infinite-width deep networks trained with gradient descent (GD) and biologically-plausible alternatives including feedback alignment (FA), direct feedback alignment (DFA), and error modulated Hebbian learning (Hebb), as well as gated linear networks (GLN). We show that, for each of these learning rules, the evolution of the output function at infinite width is governed by a time varying effective neural tangent kernel (eNTK). In the lazy training limit, this eNTK is static and does not evolve, while in the rich mean-field regime this kernel's evolution can be determined self-consistently with dynamical mean field theory (DMFT). This DMFT enables comparisons of the feature and prediction dynamics induced by each of these learning rules. In the lazy limit, we find that DFA and Hebb can only learn using the last layer features, while full FA can utilize earlier layers with a scale determined by the initial correlation between feedforward and feedback weight matrices. In the rich regime, DFA and FA utilize a temporally evolving and depth-dependent NTK. Counterintuitively, we find that FA networks trained in the rich regime exhibit more feature learning if initialized with smaller correlation between the forward and backward pass weights. GLNs admit a very simple formula for their lazy limit kernel and preserve conditional Gaussianity of their preactivations under gating functions. Error modulated Hebb rules show very small task-relevant alignment of their kernels and perform most task relevant learning in the last layer.


page 1

page 2

page 3

page 4


Dynamics of Finite Width Kernel and Prediction Fluctuations in Mean Field Neural Networks

We analyze the dynamics of finite width effects in wide but finite featu...

Global Optimality of Elman-type RNN in the Mean-Field Regime

We analyze Elman-type Recurrent Reural Networks (RNNs) and their trainin...

Self-Consistent Dynamical Field Theory of Kernel Evolution in Wide Neural Networks

We analyze feature learning in infinite width neural networks trained wi...

An analytic theory of shallow networks dynamics for hinge loss classification

Neural networks have been shown to perform incredibly well in classifica...

Rapid Feature Evolution Accelerates Learning in Neural Networks

Neural network (NN) training and generalization in the infinite-width li...

Neural Networks as Kernel Learners: The Silent Alignment Effect

Neural networks in the lazy training regime converge to kernel machines....

Globally Gated Deep Linear Networks

Recently proposed Gated Linear Networks present a tractable nonlinear ne...

Please sign up or login with your details

Forgot password? Click here to reset