An Alternative to Backpropagation in Deep Reinforcement Learning

10/15/2020
by   Stephen Chung, et al.
0

State-of-the-art deep learning algorithms mostly rely on gradient backpropagation to train a deep artificial neural network, which is generally regarded to be biologically implausible. For a network of stochastic units trained on a reinforcement learning task or a supervised learning task, one biologically plausible way of learning is to train each unit by REINFORCE. In this case, only a global reward signal has to be broadcast to all units, and the learning rule given is local, which can be interpreted as reward-modulated spike-timing-dependent plasticity (R-STDP) that is observed biologically. Although this learning rule follows the gradient of return in expectation, it suffers from high variance and cannot be used to train a deep network in practice. In this paper, we propose an algorithm called MAP propagation that can reduce this variance significantly while retaining the local property of learning rule. Different from prior works on local learning rules (e.g. Contrastive Divergence) which mostly applies to undirected models in unsupervised learning tasks, our proposed algorithm applies to directed models in reinforcement learning tasks. We show that the newly proposed algorithm can solve common reinforcement learning tasks at a speed similar to that of backpropagation when applied to an actor-critic network.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/19/2020

Every Hidden Unit Maximizing Output Weights Maximizes The Global Reward

For a network of stochastic units trained on a reinforcement learning ta...
research
08/29/2020

Reinforcement Learning with Feedback-modulated TD-STDP

Spiking neuron networks have been used successfully to solve simple rein...
research
07/25/2023

Structural Credit Assignment with Coordinated Exploration

A biologically plausible method for training an Artificial Neural Networ...
research
09/16/2019

Deep Reinforcement Learning for Task-driven Discovery of Incomplete Networks

Complex networks are often either too large for full exploration, partia...
research
07/25/2023

Unbiased Weight Maximization

A biologically plausible method for training an Artificial Neural Networ...
research
09/12/2016

A Threshold-based Scheme for Reinforcement Learning in Neural Networks

A generic and scalable Reinforcement Learning scheme for Artificial Neur...
research
10/07/2022

Scaling Forward Gradient With Local Losses

Forward gradient learning computes a noisy directional gradient and is a...

Please sign up or login with your details

Forgot password? Click here to reset