Hierarchical Imitation and Reinforcement Learning

03/01/2018
by   Hoang M. Le, et al.
0

We study the problem of learning policies over long time horizons. We present a framework that leverages and integrates two key concepts. First, we utilize hierarchical policy classes that enable planning over different time scales, i.e., the high level planner proposes a sequence of subgoals for the low level planner to achieve. Second, we utilize expert demonstrations within the hierarchical action space to dramatically reduce cost of exploration. Our framework is flexible and can incorporate different combinations of imitation learning (IL) and reinforcement learning (RL) at different levels of the hierarchy. Using long-horizon benchmarks, including Montezuma's Revenge, we empirically demonstrate that our approach can learn significantly faster compared to hierarchical RL, and can be significantly more label- and sample-efficient compared to flat IL. We also provide theoretical analysis of the labeling cost for certain instantiations of our framework.

READ FULL TEXT

page 7

page 8

page 12

page 13

research
03/26/2023

Inverse Reinforcement Learning without Reinforcement Learning

Inverse Reinforcement Learning (IRL) is a powerful set of techniques for...
research
12/07/2021

JueWu-MC: Playing Minecraft with Sample-efficient Hierarchical Reinforcement Learning

Learning rational behaviors in open-world games like Minecraft remains t...
research
03/25/2021

Self-Imitation Learning by Planning

Imitation learning (IL) enables robots to acquire skills quickly by tran...
research
10/26/2018

Neural Modular Control for Embodied Question Answering

We present a modular approach for learning policies for navigation over ...
research
07/01/2020

Reinforcement Learning based Control of Imitative Policies for Near-Accident Driving

Autonomous driving has achieved significant progress in recent years, bu...
research
03/22/2021

Online Baum-Welch algorithm for Hierarchical Imitation Learning

The options framework for hierarchical reinforcement learning has increa...
research
06/11/2019

Wasserstein Reinforcement Learning

We propose behavior-driven optimization via Wasserstein distances (WDs) ...

Please sign up or login with your details

Forgot password? Click here to reset