Explore then Execute: Adapting without Rewards via Factorized Meta-Reinforcement Learning

08/06/2020
by   Evan Zheran Liu, et al.
6

We seek to efficiently learn by leveraging shared structure between different tasks and environments. For example, cooking is similar in different kitchens, even though the ingredients may change location. In principle, meta-reinforcement learning approaches can exploit this shared structure, but in practice, they fail to adapt to new environments when adaptation requires targeted exploration (e.g., exploring the cabinets to find ingredients in a new kitchen). We show that existing approaches fail due to a chicken-and-egg problem: learning what to explore requires knowing what information is critical for solving the task, but learning to solve the task requires already gathering this information via exploration. For example, exploring to find the ingredients only helps a robot prepare a meal if it already knows how to cook, but the robot can only learn to cook if it already knows where the ingredients are. To address this, we propose a new exploration objective (DREAM), based on identifying key information in the environment, independent of how this information will exactly be used solve the task. By decoupling exploration from task execution, DREAM explores and consequently adapts to new environments, requiring no reward signal when the task is specified via an instruction. Empirically, DREAM scales to more complex problems, such as sparse-reward 3D visual navigation, while existing approaches fail from insufficient exploration.

READ FULL TEXT

page 9

page 16

research
10/02/2020

Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning

Meta-learning is a powerful tool for learning policies that can adapt ef...
research
06/09/2019

Curiosity-Driven Multi-Criteria Hindsight Experience Replay

Dealing with sparse rewards is a longstanding challenge in reinforcement...
research
03/05/2019

Learning Exploration Policies for Navigation

Numerous past works have tackled the problem of task-driven navigation. ...
research
06/02/2021

Robot in a China Shop: Using Reinforcement Learning for Location-Specific Navigation Behaviour

Robots need to be able to work in multiple different environments. Even ...
research
04/30/2023

Learning Achievement Structure for Structured Exploration in Domains with Sparse Reward

We propose Structured Exploration with Achievements (SEA), a multi-stage...
research
07/05/2023

First-Explore, then Exploit: Meta-Learning Intelligent Exploration

Standard reinforcement learning (RL) agents never intelligently explore ...
research
03/15/2019

Adaptive Variance for Changing Sparse-Reward Environments

Robots that are trained to perform a task in a fixed environment often f...

Please sign up or login with your details

Forgot password? Click here to reset