The Challenges of Exploration for Offline Reinforcement Learning

01/27/2022
by   Nathan Lambert, et al.
0

Offline Reinforcement Learning (ORL) enablesus to separately study the two interlinked processes of reinforcement learning: collecting informative experience and inferring optimal behaviour. The second step has been widely studied in the offline setting, but just as critical to data-efficient RL is the collection of informative data. The task-agnostic setting for data collection, where the task is not known a priori, is of particular interest due to the possibility of collecting a single dataset and using it to solve several downstream tasks as they arise. We investigate this setting via curiosity-based intrinsic motivation, a family of exploration methods which encourage the agent to explore those states or transitions it has not yet learned to model. With Explore2Offline, we propose to evaluate the quality of collected data by transferring the collected data and inferring policies with reward relabelling and standard offline RL algorithms. We evaluate a wide variety of data collection strategies, including a new exploration agent, Intrinsic Model Predictive Control (IMPC), using this scheme and demonstrate their performance on various tasks. We use this decoupled framework to strengthen intuitions about exploration and the data prerequisites for effective offline RL.

READ FULL TEXT

page 2

page 8

page 13

research
08/06/2020

Offline Meta Reinforcement Learning

Consider the following problem, which we term Offline Meta Reinforcement...
research
03/31/2023

Accelerating exploration and representation learning with offline pre-training

Sequential decision-making agents struggle with long horizon tasks, sinc...
research
10/22/2020

Batch Exploration with Examples for Scalable Robotic Reinforcement Learning

Learning from diverse offline datasets is a promising path towards learn...
research
10/05/2022

Visual Backtracking Teleoperation: A Data Collection Protocol for Offline Image-Based Reinforcement Learning

We consider how to most efficiently leverage teleoperator time to collec...
research
02/22/2021

Explore the Context: Optimal Data Collection for Context-Conditional Dynamics Models

In this paper, we learn dynamics models for parametrized families of dyn...
research
03/16/2020

DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction

Deep reinforcement learning can learn effective policies for a wide rang...
research
08/23/2021

Collect Infer – a fresh look at data-efficient Reinforcement Learning

This position paper proposes a fresh look at Reinforcement Learning (RL)...

Please sign up or login with your details

Forgot password? Click here to reset