Diagnosis, Feedback, Adaptation: A Human-in-the-Loop Framework for Test-Time Policy Adaptation

07/12/2023
by   Andi Peng, et al.
0

Policies often fail due to distribution shift – changes in the state and reward that occur when a policy is deployed in new environments. Data augmentation can increase robustness by making the model invariant to task-irrelevant changes in the agent's observation. However, designers don't know which concepts are irrelevant a priori, especially when different end users have different preferences about how the task is performed. We propose an interactive framework to leverage feedback directly from the user to identify personalized task-irrelevant concepts. Our key idea is to generate counterfactual demonstrations that allow users to quickly identify possible task-relevant and irrelevant concepts. The knowledge of task-irrelevant concepts is then used to perform data augmentation and thus obtain a policy adapted to personalized user objectives. We present experiments validating our framework on discrete and continuous control tasks with real human users. Our method (1) enables users to better understand agent failure, (2) reduces the number of demonstrations required for fine-tuning, and (3) aligns the agent to individual user task preferences.

READ FULL TEXT

page 5

page 6

page 7

page 11

research
10/18/2022

Output Feedback Tube MPC-Guided Data Augmentation for Robust, Efficient Sensorimotor Policy Learning

Imitation learning (IL) can generate computationally efficient sensorimo...
research
02/14/2022

Strategy Discovery and Mixture in Lifelong Learning from Heterogeneous Demonstration

Learning from Demonstration (LfD) approaches empower end-users to teach ...
research
12/10/2020

Understanding Learned Reward Functions

In many real-world tasks, it is not possible to procedurally specify an ...
research
01/29/2022

Explaining Reinforcement Learning Policies through Counterfactual Trajectories

In order for humans to confidently decide where to employ RL agents for ...
research
10/09/2021

Credit Assignment Safety Learning from Human Demonstrations

A critical need in assistive robotics, such as assistive wheelchairs for...
research
03/09/2022

Learning from Physical Human Feedback: An Object-Centric One-Shot Adaptation Method

For robots to be effectively deployed in novel environments and tasks, t...
research
11/15/2022

PARTNR: Pick and place Ambiguity Resolving by Trustworthy iNteractive leaRning

Several recent works show impressive results in mapping language-based h...

Please sign up or login with your details

Forgot password? Click here to reset