JueWu-MC: Playing Minecraft with Sample-efficient Hierarchical Reinforcement Learning

12/07/2021
by   Zichuan Lin, et al.
0

Learning rational behaviors in open-world games like Minecraft remains to be challenging for Reinforcement Learning (RL) research due to the compound challenge of partial observability, high-dimensional visual perception and delayed reward. To address this, we propose JueWu-MC, a sample-efficient hierarchical RL approach equipped with representation learning and imitation learning to deal with perception and exploration. Specifically, our approach includes two levels of hierarchy, where the high-level controller learns a policy to control over options and the low-level workers learn to solve each sub-task. To boost the learning of sub-tasks, we propose a combination of techniques including 1) action-aware representation learning which captures underlying relations between action and representation, 2) discriminator-based self-imitation learning for efficient exploration, and 3) ensemble behavior cloning with consistency filtering for policy robustness. Extensive experiments show that JueWu-MC significantly improves sample efficiency and outperforms a set of baselines by a large margin. Notably, we won the championship of the NeurIPS MineRL 2021 research competition and achieved the highest performance score ever.

READ FULL TEXT
research
03/01/2018

Hierarchical Imitation and Reinforcement Learning

We study the problem of learning policies over long time horizons. We pr...
research
12/22/2020

Self-Imitation Advantage Learning

Self-imitation learning is a Reinforcement Learning (RL) method that enc...
research
03/14/2018

Imitation Learning with Concurrent Actions in 3D Games

In this work we describe a novel deep reinforcement learning neural netw...
research
08/28/2019

An Empirical Comparison on Imitation Learning and Reinforcement Learning for Paraphrase Generation

Generating paraphrases from given sentences involves decoding words step...
research
07/01/2020

Reinforcement Learning based Control of Imitative Policies for Near-Accident Driving

Autonomous driving has achieved significant progress in recent years, bu...
research
06/22/2022

Learning Representations for Control with Hierarchical Forward Models

Learning control from pixels is difficult for reinforcement learning (RL...
research
12/23/2020

Augmenting Policy Learning with Routines Discovered from a Single Demonstration

Humans can abstract prior knowledge from very little data and use it to ...

Please sign up or login with your details

Forgot password? Click here to reset