Synthetic Returns for Long-Term Credit Assignment

02/24/2021
by   David Raposo, et al.
6

Since the earliest days of reinforcement learning, the workhorse method for assigning credit to actions over time has been temporal-difference (TD) learning, which propagates credit backward timestep-by-timestep. This approach suffers when delays between actions and rewards are long and when intervening unrelated events contribute variance to long-term returns. We propose state-associative (SA) learning, where the agent learns associations between states and arbitrarily distant future rewards, then propagates credit directly between the two. In this work, we use SA-learning to model the contribution of past states to the current reward. With this model we can predict each state's contribution to the far future, a quantity we call "synthetic returns". TD-learning can then be applied to select actions that maximize these synthetic returns (SRs). We demonstrate the effectiveness of augmenting agents with SRs across a range of tasks on which TD-learning alone fails. We show that the learned SRs are interpretable: they spike for states that occur after critical actions are taken. Finally, we show that our IMPALA-based SR agent solves Atari Skiing – a game with a lengthy reward delay that posed a major hurdle to deep-RL agents – 25 times faster than the published state-of-the-art.

READ FULL TEXT

page 3

page 4

page 7

page 9

page 11

page 12

page 13

page 14

09/11/2018

Sparse Attentive Backtracking: Temporal CreditAssignment Through Reminding

Learning long-term dependencies in extended temporal sequences requires ...
10/06/2019

Probabilistic Successor Representations with Kalman Temporal Differences

The effectiveness of Reinforcement Learning (RL) depends on an animal's ...
10/15/2018

Optimizing Agent Behavior over Long Time Scales by Transporting Value

Humans spend a remarkable fraction of waking life engaged in acts of "me...
10/23/2020

Learning Guidance Rewards with Trajectory-space Smoothing

Long-term temporal credit assignment is an important challenge in deep r...
11/18/2020

Counterfactual Credit Assignment in Model-Free Reinforcement Learning

Credit assignment in reinforcement learning is the problem of measuring ...
05/31/2019

Sequence Modeling of Temporal Credit Assignment for Episodic Reinforcement Learning

Recent advances in deep reinforcement learning algorithms have shown gre...
02/04/2014

Short-term plasticity as cause-effect hypothesis testing in distal reward learning

Asynchrony, overlaps and delays in sensory-motor signals introduce ambig...