Recall Traces: Backtracking Models for Efficient Reinforcement Learning

04/02/2018
by   Anirudh Goyal, et al.
0

In many environments only a tiny subset of all states yield high reward. In these cases, few of the interactions with the environment provide a relevant learning signal. Hence, we may want to preferentially train on those high-reward states and the probable trajectories leading to them. To this end, we advocate for the use of a backtracking model that predicts the preceding states that terminate at a given high-reward state. We can train a model which, starting from a high value state (or one that is estimated to have high value), predicts and sample for which the (state, action)-tuples may have led to that high value state. These traces of (state, action) pairs, which we refer to as Recall Traces, sampled from this backtracking model starting from a high value state, are informative as they terminate in good states, and hence we can use these traces to improve a policy. We provide a variational interpretation for this idea and a practical algorithm in which the backtracking model samples from an approximate posterior distribution over trajectories which lead to large rewards. Our method improves the sample efficiency of both on- and off-policy RL algorithms across several environments and tasks.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

page 12

page 13

06/02/2018

Internal Model from Observations for Reward Shaping

Reinforcement learning methods require careful design involving a reward...
04/13/2021

Subgoal-based Reward Shaping to Improve Efficiency in Reinforcement Learning

Reinforcement learning, which acquires a policy maximizing long-term rew...
03/22/2019

DQN with model-based exploration: efficient learning on environments with sparse rewards

We propose Deep Q-Networks (DQN) with model-based exploration, an algori...
12/24/2020

Assured RL: Reinforcement Learning with Almost Sure Constraints

We consider the problem of finding optimal policies for a Markov Decisio...
04/13/2021

Reward Shaping with Dynamic Trajectory Aggregation

Reinforcement learning, which acquires a policy maximizing long-term rew...
02/22/2021

Deep Reinforcement Learning for Dynamic Spectrum Sharing of LTE and NR

In this paper, a proactive dynamic spectrum sharing scheme between 4G an...
06/08/2021

PlayVirtual: Augmenting Cycle-Consistent Virtual Trajectories for Reinforcement Learning

Learning good feature representations is important for deep reinforcemen...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.