Scaling Forward Gradient With Local Losses

10/07/2022
by   Mengye Ren, et al.
8

Forward gradient learning computes a noisy directional gradient and is a biologically plausible alternative to backprop for learning deep neural networks. However, the standard forward gradient algorithm, when applied naively, suffers from high variance when the number of parameters to be learned is large. In this paper, we propose a series of architectural and algorithmic modifications that together make forward gradient learning practical for standard deep learning benchmark tasks. We show that it is possible to substantially reduce the variance of the forward gradient estimator by applying perturbations to activations rather than weights. We further improve the scalability of forward gradient by introducing a large number of local greedy loss functions, each of which involves only a small number of learnable parameters, and a new MLPMixer-inspired architecture, LocalMixer, that is more suitable for local learning. Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/20/2019

Training Neural Networks with Local Error Signals

Supervised training of neural networks for classification is typically p...
research
05/22/2023

The Integrated Forward-Forward Algorithm: Integrating Forward-Forward and Shallow Backpropagation With Local Losses

The backpropagation algorithm, despite its widespread use in neural netw...
research
06/06/2020

Scaling Equilibrium Propagation to Deep ConvNets by Drastically Reducing its Gradient Estimator Bias

Equilibrium Propagation (EP) is a biologically-inspired algorithm for co...
research
06/12/2023

Can Forward Gradient Match Backpropagation?

Forward Gradients - the idea of using directional derivatives in forward...
research
10/15/2020

An Alternative to Backpropagation in Deep Reinforcement Learning

State-of-the-art deep learning algorithms mostly rely on gradient backpr...
research
10/12/2020

Large-Scale Methods for Distributionally Robust Optimization

We propose and analyze algorithms for distributionally robust optimizati...
research
07/01/2023

Forward-Forward Algorithm for Hyperspectral Image Classification: A Preliminary Study

The back-propagation algorithm has long been the de-facto standard in op...

Please sign up or login with your details

Forgot password? Click here to reset