Recall Traces: Backtracking Models for Efficient Reinforcement Learning

04/02/2018
by   Anirudh Goyal, et al.
0

In many environments only a tiny subset of all states yield high reward. In these cases, few of the interactions with the environment provide a relevant learning signal. Hence, we may want to preferentially train on those high-reward states and the probable trajectories leading to them. To this end, we advocate for the use of a backtracking model that predicts the preceding states that terminate at a given high-reward state. We can train a model which, starting from a high value state (or one that is estimated to have high value), predicts and sample for which the (state, action)-tuples may have led to that high value state. These traces of (state, action) pairs, which we refer to as Recall Traces, sampled from this backtracking model starting from a high value state, are informative as they terminate in good states, and hence we can use these traces to improve a policy. We provide a variational interpretation for this idea and a practical algorithm in which the backtracking model samples from an approximate posterior distribution over trajectories which lead to large rewards. Our method improves the sample efficiency of both on- and off-policy RL algorithms across several environments and tasks.

READ FULL TEXT

page 7

page 12

page 13

research
08/04/2022

Backward Imitation and Forward Reinforcement Learning via Bi-directional Model Rollouts

Traditional model-based reinforcement learning (RL) methods generate for...
research
06/02/2018

Internal Model from Observations for Reward Shaping

Reinforcement learning methods require careful design involving a reward...
research
04/13/2021

Subgoal-based Reward Shaping to Improve Efficiency in Reinforcement Learning

Reinforcement learning, which acquires a policy maximizing long-term rew...
research
12/24/2020

Assured RL: Reinforcement Learning with Almost Sure Constraints

We consider the problem of finding optimal policies for a Markov Decisio...
research
04/13/2021

Reward Shaping with Dynamic Trajectory Aggregation

Reinforcement learning, which acquires a policy maximizing long-term rew...
research
11/29/2022

Posterior Sampling for Continuing Environments

We develop an extension of posterior sampling for reinforcement learning...
research
06/08/2021

PlayVirtual: Augmenting Cycle-Consistent Virtual Trajectories for Reinforcement Learning

Learning good feature representations is important for deep reinforcemen...

Please sign up or login with your details

Forgot password? Click here to reset