Augmenting Policy Learning with Routines Discovered from a Single Demonstration

12/23/2020
by   Zelin Zhao, et al.
6

Humans can abstract prior knowledge from very little data and use it to boost skill learning. In this paper, we propose routine-augmented policy learning (RAPL), which discovers routines composed of primitive actions from a single demonstration and uses discovered routines to augment policy learning. To discover routines from the demonstration, we first abstract routine candidates by identifying grammar over the demonstrated action trajectory. Then, the best routines measured by length and frequency are selected to form a routine library. We propose to learn policy simultaneously at primitive-level and routine-level with discovered routines, leveraging the temporal structure of routines. Our approach enables imitating expert behavior at multiple temporal scales for imitation learning and promotes reinforcement learning exploration. Extensive experiments on Atari games demonstrate that RAPL improves the state-of-the-art imitation learning method SQIL and reinforcement learning method A2C. Further, we show that discovered routines can generalize to unseen levels and difficulties on the CoinRun benchmark.

READ FULL TEXT

page 5

page 6

research
03/11/2019

Hybrid Reinforcement Learning with Expert State Sequences

Existing imitation learning approaches often require that the complete d...
research
12/16/2019

To Follow or not to Follow: Selective Imitation Learning from Observations

Learning from demonstrations is a useful way to transfer a skill from on...
research
10/27/2019

BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning

The field of Deep Reinforcement Learning (DRL) has recently seen a surge...
research
06/10/2022

Imitation Learning via Differentiable Physics

Existing imitation learning (IL) methods such as inverse reinforcement l...
research
09/18/2020

Compressed imitation learning

In analogy to compressed sensing, which allows sample-efficient signal r...
research
12/07/2021

JueWu-MC: Playing Minecraft with Sample-efficient Hierarchical Reinforcement Learning

Learning rational behaviors in open-world games like Minecraft remains t...
research
03/15/2019

Atari-HEAD: Atari Human Eye-Tracking and Demonstration Dataset

We introduce a large-scale dataset of human actions and eye movements wh...

Please sign up or login with your details

Forgot password? Click here to reset