Every Hidden Unit Maximizing Output Weights Maximizes The Global Reward

10/19/2020
by   Stephen Chung, et al.
0

For a network of stochastic units trained on a reinforcement learning task, one biologically plausible way of learning is to treat each unit as a reinforcement learning unit and train each unit by REINFORCE using the same global reward signal. In this case, only a global reward signal has to be broadcast to all units, and the learning rule given is local. Although this learning rule follows the gradient of return in expectation, it suffers from high variance and cannot be used to train a deep network in practice. In this paper, we propose an algorithm called Weight Maximization, which can significantly improve the speed of applying REINFORCE to all units. Essentially, we replace the global reward to each hidden unit with the change in the norm of output weights, such that each hidden unit in the network is trying to maximize the norm of output weights instead of the global reward. We found that the new algorithm can solve simple reinforcement learning tasks significantly faster than the baseline model. We also prove that the resulting learning rule is approximately following gradient ascent on the reward in expectation when applied to a multi-layer network of Bernoulli logistic unit. It illustrates an example of intelligent behavior arising from a population of self-interested hedonistic neurons, which corresponds to Klopf's hedonistic neuron hypothesis.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/15/2020

An Alternative to Backpropagation in Deep Reinforcement Learning

State-of-the-art deep learning algorithms mostly rely on gradient backpr...
research
07/25/2023

Unbiased Weight Maximization

A biologically plausible method for training an Artificial Neural Networ...
research
07/25/2023

Structural Credit Assignment with Coordinated Exploration

A biologically plausible method for training an Artificial Neural Networ...
research
08/29/2020

Reinforcement Learning with Feedback-modulated TD-STDP

Spiking neuron networks have been used successfully to solve simple rein...
research
05/29/2021

Gradient-Free Neural Network Training via Synaptic-Level Reinforcement Learning

An ongoing challenge in neural information processing is: how do neurons...
research
09/04/2020

Policy Gradient Reinforcement Learning for Policy Represented by Fuzzy Rules: Application to Simulations of Speed Control of an Automobile

A method of a fusion of fuzzy inference and policy gradient reinforcemen...
research
10/14/2021

Sign and Relevance learning

Standard models of biologically realistic, or inspired, reinforcement le...

Please sign up or login with your details

Forgot password? Click here to reset