Flexible and Efficient Long-Range Planning Through Curious Exploration

04/22/2020
by   Aidan Curtis, et al.
0

Identifying algorithms that flexibly and efficiently discover temporally-extended multi-phase plans is an essential step for the advancement of robotics and model-based reinforcement learning. The core problem of long-range planning is finding an efficient way to search through the tree of possible action sequences. Existing non-learned planning solutions from the Task and Motion Planning (TAMP) literature rely on the existence of logical descriptions for the effects and preconditions for actions. This constraint allows TAMP methods to efficiently reduce the tree search problem but limits their ability to generalize to unseen and complex physical environments. In contrast, deep reinforcement learning (DRL) methods use flexible neural-network-based function approximators to discover policies that generalize naturally to unseen circumstances. However, DRL methods struggle to handle the very sparse reward landscapes inherent to long-range multi-step planning situations. Here, we propose the Curious Sample Planner (CSP), which fuses elements of TAMP and DRL by combining a curiosity-guided sampling strategy with imitation learning to accelerate planning. We show that CSP can efficiently discover interesting and complex temporally-extended plans for solving a wide range of physically realistic 3D tasks. In contrast, standard planning and learning methods often fail to solve these tasks at all or do so only with a huge and highly variable number of training samples. We explore the use of a variety of curiosity metrics with CSP and analyze the types of solutions that CSP discovers. Finally, we show that CSP supports task transfer so that the exploration policies learned during experience with one task can help improve efficiency on related tasks.

READ FULL TEXT

page 5

page 7

research
05/23/2017

Thinking Fast and Slow with Deep Learning and Tree Search

Sequential decision making problems, such as structured prediction, robo...
research
07/19/2022

Abstract Demonstrations and Adaptive Exploration for Efficient and Stable Multi-step Sparse Reward Reinforcement Learning

Although Deep Reinforcement Learning (DRL) has been popular in many disc...
research
10/13/2020

Broadly-Exploring, Local-Policy Trees for Long-Horizon Task Planning

Long-horizon planning in realistic environments requires the ability to ...
research
01/10/2023

Deep Reinforcement Learning for Autonomous Ground Vehicle Exploration Without A-Priori Maps

Autonomous Ground Vehicles (AGVs) are essential tools for a wide range o...
research
09/23/2021

All-in-One: A DRL-based Control Switch Combining State-of-the-art Navigation Planners

Autonomous navigation of mobile robots is an essential aspect in use cas...
research
01/24/2022

Generative Planning for Temporally Coordinated Exploration in Reinforcement Learning

Standard model-free reinforcement learning algorithms optimize a policy ...
research
05/20/2022

Planning with Diffusion for Flexible Behavior Synthesis

Model-based reinforcement learning methods often use learning only for t...

Please sign up or login with your details

Forgot password? Click here to reset