Behavioral Cloning via Search in Embedded Demonstration Dataset

06/15/2023
by   Federico Malato, et al.
0

Behavioural cloning uses a dataset of demonstrations to learn a behavioural policy. To overcome various learning and policy adaptation problems, we propose to use latent space to index a demonstration dataset, instantly access similar relevant experiences, and copy behavior from these situations. Actions from a selected similar situation can be performed by the agent until representations of the agent's current situation and the selected experience diverge in the latent space. Thus, we formulate our control problem as a search problem over a dataset of experts' demonstrations. We test our approach on BASALT MineRL-dataset in the latent representation of a Video PreTraining model. We compare our model to state-of-the-art Minecraft agents. Our approach can effectively recover meaningful demonstrations and show human-like behavior of an agent in the Minecraft environment in a wide variety of scenarios. Experimental results reveal that performance of our search-based approach is comparable to trained models, while allowing zero-shot task adaptation by changing the demonstration examples.

READ FULL TEXT

page 3

page 7

page 14

page 17

research
12/27/2022

Behavioral Cloning via Search in Video PreTraining Latent Space

Our aim is to build autonomous agents that can solve tasks in environmen...
research
10/26/2022

Leveraging Demonstrations with Latent Space Priors

Demonstrations provide insight into relevant state or action space regio...
research
03/14/2022

Safe adaptation in multiagent competition

Achieving the capability of adapting to ever-changing environments is a ...
research
03/25/2020

Adaptive Conditional Neural Movement Primitives via Representation Sharing Between Supervised and Reinforcement Learning

Learning by Demonstration provides a sample efficient way to equip robot...
research
09/08/2021

Video2Skill: Adapting Events in Demonstration Videos to Skills in an Environment using Cyclic MDP Homomorphisms

Humans excel at learning long-horizon tasks from demonstrations augmente...
research
05/04/2016

A Bayesian Approach to Policy Recognition and State Representation Learning

Learning from demonstration (LfD) is the process of building behavioral ...
research
10/16/2017

Gradient-free Policy Architecture Search and Adaptation

We develop a method for policy architecture search and adaptation via gr...

Please sign up or login with your details

Forgot password? Click here to reset