Synthetic Returns for Long-Term Credit Assignment

by   David Raposo, et al.

Since the earliest days of reinforcement learning, the workhorse method for assigning credit to actions over time has been temporal-difference (TD) learning, which propagates credit backward timestep-by-timestep. This approach suffers when delays between actions and rewards are long and when intervening unrelated events contribute variance to long-term returns. We propose state-associative (SA) learning, where the agent learns associations between states and arbitrarily distant future rewards, then propagates credit directly between the two. In this work, we use SA-learning to model the contribution of past states to the current reward. With this model we can predict each state's contribution to the far future, a quantity we call "synthetic returns". TD-learning can then be applied to select actions that maximize these synthetic returns (SRs). We demonstrate the effectiveness of augmenting agents with SRs across a range of tasks on which TD-learning alone fails. We show that the learned SRs are interpretable: they spike for states that occur after critical actions are taken. Finally, we show that our IMPALA-based SR agent solves Atari Skiing – a game with a lengthy reward delay that posed a major hurdle to deep-RL agents – 25 times faster than the published state-of-the-art.


page 3

page 4

page 7

page 9

page 11

page 12

page 13

page 14


Sparse Attentive Backtracking: Temporal CreditAssignment Through Reminding

Learning long-term dependencies in extended temporal sequences requires ...

Probabilistic Successor Representations with Kalman Temporal Differences

The effectiveness of Reinforcement Learning (RL) depends on an animal's ...

Optimizing Agent Behavior over Long Time Scales by Transporting Value

Humans spend a remarkable fraction of waking life engaged in acts of "me...

Learning Guidance Rewards with Trajectory-space Smoothing

Long-term temporal credit assignment is an important challenge in deep r...

Counterfactual Credit Assignment in Model-Free Reinforcement Learning

Credit assignment in reinforcement learning is the problem of measuring ...

Sequence Modeling of Temporal Credit Assignment for Episodic Reinforcement Learning

Recent advances in deep reinforcement learning algorithms have shown gre...

Short-term plasticity as cause-effect hypothesis testing in distal reward learning

Asynchrony, overlaps and delays in sensory-motor signals introduce ambig...