TempoRL: Temporal Priors for Exploration in Off-Policy Reinforcement Learning

05/26/2022
by   Marco Bagatella, et al.
0

Efficient exploration is a crucial challenge in deep reinforcement learning. Several methods, such as behavioral priors, are able to leverage offline data in order to efficiently accelerate reinforcement learning on complex tasks. However, if the task at hand deviates excessively from the demonstrated task, the effectiveness of such methods is limited. In our work, we propose to learn features from offline data that are shared by a more diverse range of tasks, such as correlation between actions and directedness. Therefore, we introduce state-independent temporal priors, which directly model temporal consistency in demonstrated trajectories, and are capable of driving exploration in complex tasks, even when trained on data collected on simpler tasks. Furthermore, we introduce a novel integration scheme for action priors in off-policy reinforcement learning by dynamically sampling actions from a probabilistic mixture of policy and action prior. We compare our approach against strong baselines and provide empirical evidence that it can accelerate reinforcement learning in long-horizon continuous control tasks under sparse reward settings.

READ FULL TEXT

page 2

page 9

page 20

page 24

research
12/07/2018

Off-Policy Deep Reinforcement Learning without Exploration

Reinforcement learning traditionally considers the task of balancing exp...
research
09/08/2023

Subwords as Skills: Tokenization for Sparse-Reward Reinforcement Learning

Exploration in sparse-reward reinforcement learning is difficult due to ...
research
09/10/2019

Learning Transferable Domain Priors for Safe Exploration in Reinforcement Learning

Prior access to domain knowledge could significantly improve the perform...
research
09/09/2022

A Memory-Related Multi-Task Method Based on Task-Agnostic Exploration

We pose a new question: Can agents learn how to combine actions from pre...
research
02/27/2010

Learning from Logged Implicit Exploration Data

We provide a sound and consistent foundation for the use of nonrandom ex...
research
10/27/2020

Behavior Priors for Efficient Reinforcement Learning

As we deploy reinforcement learning agents to solve increasingly challen...
research
06/01/2021

Did I do that? Blame as a means to identify controlled effects in reinforcement learning

Modeling controllable aspects of the environment enable better prioritiz...

Please sign up or login with your details

Forgot password? Click here to reset