
Direct Feedback Alignment Provides Learning in Deep Neural Networks
Artificial neural networks are most commonly trained with the backpropa...
read it

Training DNNs in O(1) memory with MEMDFA using Random Matrices
This work presents a method for reducing memory consumption to a constan...
read it

Principled Training of Neural Networks with Direct Feedback Alignment
The backpropagation algorithm has long been the canonical training metho...
read it

Feedback alignment in deep convolutional networks
Ongoing studies have identified similarities between neural representati...
read it

Hardware Beyond Backpropagation: a Photonic CoProcessor for Direct Feedback Alignment
The scaling hypothesis motivates the expansion of models past trillions ...
read it

Direct Feedback Alignment with Sparse Connections for Local Learning
Recent advances in deep neural networks (DNNs) owe their success to trai...
read it

Dynamic Weight Alignment for Convolutional Neural Networks
In this paper, we propose a method of improving Convolutional Neural Net...
read it
The dynamics of learning with feedback alignment
Direct Feedback Alignment (DFA) is emerging as an efficient and biologically plausible alternative to the ubiquitous backpropagation algorithm for training deep neural networks. Despite relying on random feedback weights for the backward pass, DFA successfully trains stateoftheart models such as Transformers. On the other hand, it notoriously fails to train convolutional networks. An understanding of the inner workings of DFA to explain these diverging results remains elusive. Here, we propose a theory for the success of DFA. We first show that learning in shallow networks proceeds in two steps: an alignment phase, where the model adapts its weights to align the approximate gradient with the true gradient of the loss function, is followed by a memorisation phase, where the model focuses on fitting the data. This twostep process has a degeneracy breaking effect: out of all the lowloss solutions in the landscape, a network trained with DFA naturally converges to the solution which maximises gradient alignment. We also identify a key quantity underlying alignment in deep linear networks: the conditioning of the alignment matrices. The latter enables a detailed understanding of the impact of data structure on alignment, and suggests a simple explanation for the wellknown failure of DFA to train convolutional neural networks. Numerical experiments on MNIST and CIFAR10 clearly demonstrate degeneracy breaking in deep nonlinear networks and show that the alignthenmemorize process occurs sequentially from the bottom layers of the network to the top.
READ FULL TEXT
Comments
There are no comments yet.