Fast Lifelong Adaptive Inverse Reinforcement Learning from Demonstrations

09/24/2022
by   Letian Chen, et al.
0

Learning from Demonstration (LfD) approaches empower end-users to teach robots novel tasks via demonstrations of the desired behaviors, democratizing access to robotics. However, current LfD frameworks are not capable of fast adaptation to heterogeneous human demonstrations nor the large-scale deployment in ubiquitous robotics applications. In this paper, we propose a novel LfD framework, Fast Lifelong Adaptive Inverse Reinforcement learning (FLAIR). Our approach (1) leverages learned strategies to construct policy mixtures for fast adaptation to new demonstrations, allowing for quick end-user personalization; (2) distills common knowledge across demonstrations, achieving accurate task inference; and (3) expands its model only when needed in lifelong deployments, maintaining a concise set of prototypical strategies that can approximate all behaviors via policy mixtures. We empirically validate that FLAIR achieves adaptability (i.e., the robot adapts to heterogeneous, user-specific task preferences), efficiency (i.e., the robot achieves sample-efficient adaptation), and scalability (i.e., the model grows sublinearly with the number of demonstrations while maintaining high performance). FLAIR surpasses benchmarks across three continuous control tasks with an average 57 improvement in policy returns and an average 78 demonstration modeling using policy mixtures. Finally, we demonstrate the success of FLAIR in a real-robot table tennis task.

READ FULL TEXT
research
02/14/2022

Strategy Discovery and Mixture in Lifelong Learning from Heterogeneous Demonstration

Learning from Demonstration (LfD) approaches empower end-users to teach ...
research
03/25/2022

Quantifying Demonstration Quality for Robot Learning and Generalization

Learning from Demonstration (LfD) seeks to democratize robotics by enabl...
research
05/13/2020

DREAM Architecture: a Developmental Approach to Open-Ended Learning in Robotics

Robots are still limited to controlled conditions, that the robot design...
research
11/15/2022

PARTNR: Pick and place Ambiguity Resolving by Trustworthy iNteractive leaRning

Several recent works show impressive results in mapping language-based h...
research
10/09/2021

Credit Assignment Safety Learning from Human Demonstrations

A critical need in assistive robotics, such as assistive wheelchairs for...
research
06/29/2020

Supervised Learning and Reinforcement Learning of Feedback Models for Reactive Behaviors: Tactile Feedback Testbed

Robots need to be able to adapt to unexpected changes in the environment...
research
11/07/2018

Generative Adversarial Policy Networks for Behavioural Repertoire

Learning algorithms are enabling robots to solve increasingly challengin...

Please sign up or login with your details

Forgot password? Click here to reset