Learning a Behavioral Repertoire from Demonstrations

07/05/2019
by   Niels Justesen, et al.
5

Imitation Learning (IL) is a machine learning approach to learn a policy from a dataset of demonstrations. IL can be useful to kick-start learning before applying reinforcement learning (RL) but it can also be useful on its own, e.g. to learn to imitate human players in video games. However, a major limitation of current IL approaches is that they learn only a single "average" policy based on a dataset that possibly contains demonstrations of numerous different types of behaviors. In this paper, we propose a new approach called Behavioral Repertoire Imitation Learning (BRIL) that instead learns a repertoire of behaviors from a set of demonstrations by augmenting the state-action pairs with behavioral descriptions. The outcome of this approach is a single neural network policy conditioned on a behavior description that can be precisely modulated. We apply this approach to train a policy on 7,777 human replays to perform build-order planning in StarCraft II. Principal Component Analysis (PCA) is applied to construct a low-dimensional behavioral space from the high-dimensional army unit composition of each demonstration. The results demonstrate that the learned policy can be effectively manipulated to express distinct behaviors. Additionally, by applying the UCB1 algorithm, we are able to adapt the behavior of the policy - in-between games - to reach a performance beyond that of the traditional IL baseline approach.

READ FULL TEXT

page 6

page 7

research
03/26/2017

InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations

The goal of imitation learning is to mimic expert behavior without acces...
research
06/22/2020

PICO: Primitive Imitation for COntrol

In this work, we explore a novel framework for control of complex system...
research
10/14/2022

Eliciting Compatible Demonstrations for Multi-Human Imitation Learning

Imitation learning from human-provided demonstrations is a strong approa...
research
03/06/2022

MIRROR: Differentiable Deep Social Projection for Assistive Human-Robot Communication

Communication is a hallmark of intelligence. In this work, we present MI...
research
08/02/2021

Adaptive t-Momentum-based Optimization for Unknown Ratio of Outliers in Amateur Data in Imitation Learning

Behavioral cloning (BC) bears a high potential for safe and direct trans...
research
10/01/2018

Interactive Agent Modeling by Learning to Probe

The ability of modeling the other agents, such as understanding their in...

Please sign up or login with your details

Forgot password? Click here to reset