Action-modulated midbrain dopamine activity arises from distributed control policies

07/01/2022
by   Jack Lindsey, et al.
0

Animal behavior is driven by multiple brain regions working in parallel with distinct control policies. We present a biologically plausible model of off-policy reinforcement learning in the basal ganglia, which enables learning in such an architecture. The model accounts for action-related modulation of dopamine activity that is not captured by previous models that implement on-policy algorithms. In particular, the model predicts that dopamine activity signals a combination of reward prediction error (as in classic models) and "action surprise," a measure of how unexpected an action is relative to the basal ganglia's current policy. In the presence of the action surprise term, the model implements an approximate form of Q-learning. On benchmark navigation and reaching tasks, we show empirically that this model is capable of learning from data driven completely or in part by other policies (e.g. from other brain regions). By contrast, models without the action surprise term suffer in the presence of additional policies, and are incapable of learning at all from behavior that is completely externally driven. The model provides a computational account for numerous experimental findings about dopamine activity that cannot be explained by classic models of reinforcement learning in the basal ganglia. These include differing levels of action surprise signals in dorsal and ventral striatum, decreasing amounts movement-modulated dopamine activity with practice, and representations of action initiation and kinematics in dopamine activity. It also provides further predictions that can be tested with recordings of striatal dopamine activity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/26/2020

Towards sample-efficient episodic control with DAC-ML

The sample-inefficiency problem in Artificial Intelligence refers to the...
research
01/03/2019

Imminent Collision Mitigation with Reinforcement Learning and Vision

This work examines the role of reinforcement learning in reducing the se...
research
06/06/2020

Efficient Evaluation of Natural Stochastic Policies in Offline Reinforcement Learning

We study the efficient off-policy evaluation of natural stochastic polic...
research
05/21/2014

Off-Policy Shaping Ensembles in Reinforcement Learning

Recent advances of gradient temporal-difference methods allow to learn o...
research
06/27/2022

Distinguishing Learning Rules with Brain Machine Interfaces

Despite extensive theoretical work on biologically plausible learning ru...
research
05/11/2023

Towards Theoretical Understanding of Data-Driven Policy Refinement

This paper presents an approach for data-driven policy refinement in rei...
research
07/02/2020

Towards Data-Driven Affirmative Action Policies under Uncertainty

In this paper, we study university admissions under a centralized system...

Please sign up or login with your details

Forgot password? Click here to reset