Recall Traces: Backtracking Models for Efficient Reinforcement Learning

by   Anirudh Goyal, et al.

In many environments only a tiny subset of all states yield high reward. In these cases, few of the interactions with the environment provide a relevant learning signal. Hence, we may want to preferentially train on those high-reward states and the probable trajectories leading to them. To this end, we advocate for the use of a backtracking model that predicts the preceding states that terminate at a given high-reward state. We can train a model which, starting from a high value state (or one that is estimated to have high value), predicts and sample for which the (state, action)-tuples may have led to that high value state. These traces of (state, action) pairs, which we refer to as Recall Traces, sampled from this backtracking model starting from a high value state, are informative as they terminate in good states, and hence we can use these traces to improve a policy. We provide a variational interpretation for this idea and a practical algorithm in which the backtracking model samples from an approximate posterior distribution over trajectories which lead to large rewards. Our method improves the sample efficiency of both on- and off-policy RL algorithms across several environments and tasks.



There are no comments yet.


page 7

page 12

page 13


Internal Model from Observations for Reward Shaping

Reinforcement learning methods require careful design involving a reward...

Subgoal-based Reward Shaping to Improve Efficiency in Reinforcement Learning

Reinforcement learning, which acquires a policy maximizing long-term rew...

DQN with model-based exploration: efficient learning on environments with sparse rewards

We propose Deep Q-Networks (DQN) with model-based exploration, an algori...

Assured RL: Reinforcement Learning with Almost Sure Constraints

We consider the problem of finding optimal policies for a Markov Decisio...

Reward Shaping with Dynamic Trajectory Aggregation

Reinforcement learning, which acquires a policy maximizing long-term rew...

Deep Reinforcement Learning for Dynamic Spectrum Sharing of LTE and NR

In this paper, a proactive dynamic spectrum sharing scheme between 4G an...

PlayVirtual: Augmenting Cycle-Consistent Virtual Trajectories for Reinforcement Learning

Learning good feature representations is important for deep reinforcemen...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.