Action-modulated midbrain dopamine activity arises from distributed control policies

by   Jack Lindsey, et al.

Animal behavior is driven by multiple brain regions working in parallel with distinct control policies. We present a biologically plausible model of off-policy reinforcement learning in the basal ganglia, which enables learning in such an architecture. The model accounts for action-related modulation of dopamine activity that is not captured by previous models that implement on-policy algorithms. In particular, the model predicts that dopamine activity signals a combination of reward prediction error (as in classic models) and "action surprise," a measure of how unexpected an action is relative to the basal ganglia's current policy. In the presence of the action surprise term, the model implements an approximate form of Q-learning. On benchmark navigation and reaching tasks, we show empirically that this model is capable of learning from data driven completely or in part by other policies (e.g. from other brain regions). By contrast, models without the action surprise term suffer in the presence of additional policies, and are incapable of learning at all from behavior that is completely externally driven. The model provides a computational account for numerous experimental findings about dopamine activity that cannot be explained by classic models of reinforcement learning in the basal ganglia. These include differing levels of action surprise signals in dorsal and ventral striatum, decreasing amounts movement-modulated dopamine activity with practice, and representations of action initiation and kinematics in dopamine activity. It also provides further predictions that can be tested with recordings of striatal dopamine activity.


page 1

page 2

page 3

page 4


Towards sample-efficient episodic control with DAC-ML

The sample-inefficiency problem in Artificial Intelligence refers to the...

Imminent Collision Mitigation with Reinforcement Learning and Vision

This work examines the role of reinforcement learning in reducing the se...

Efficient Evaluation of Natural Stochastic Policies in Offline Reinforcement Learning

We study the efficient off-policy evaluation of natural stochastic polic...

Off-Policy Shaping Ensembles in Reinforcement Learning

Recent advances of gradient temporal-difference methods allow to learn o...

Distinguishing Learning Rules with Brain Machine Interfaces

Despite extensive theoretical work on biologically plausible learning ru...

Towards Theoretical Understanding of Data-Driven Policy Refinement

This paper presents an approach for data-driven policy refinement in rei...

Towards Data-Driven Affirmative Action Policies under Uncertainty

In this paper, we study university admissions under a centralized system...

Please sign up or login with your details

Forgot password? Click here to reset