Efficient Meta Reinforcement Learning for Preference-based Fast Adaptation

11/20/2022
by   Zhizhou Ren, et al.
0

Learning new task-specific skills from a few trials is a fundamental challenge for artificial intelligence. Meta reinforcement learning (meta-RL) tackles this problem by learning transferable policies that support few-shot adaptation to unseen tasks. Despite recent advances in meta-RL, most existing methods require the access to the environmental reward function of new tasks to infer the task objective, which is not realistic in many practical applications. To bridge this gap, we study the problem of few-shot adaptation in the context of human-in-the-loop reinforcement learning. We develop a meta-RL algorithm that enables fast policy adaptation with preference-based feedback. The agent can adapt to new tasks by querying human's preference between behavior trajectories instead of using per-step numeric rewards. By extending techniques from information theory, our approach can design query sequences to maximize the information gain from human interactions while tolerating the inherent error of non-expert human oracle. In experiments, we extensively evaluate our method, Adaptation with Noisy OracLE (ANOLE), on a variety of meta-RL benchmark tasks and demonstrate substantial improvement over baseline algorithms in terms of both feedback efficiency and error tolerance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/06/2022

Few-Shot Preference Learning for Human-in-the-Loop RL

While reinforcement learning (RL) has become a more popular approach for...
research
07/19/2022

Learning Action Translator for Meta Reinforcement Learning on Sparse-Reward Tasks

Meta reinforcement learning (meta-RL) aims to learn a policy solving a s...
research
03/09/2022

Learning from Physical Human Feedback: An Object-Centric One-Shot Adaptation Method

For robots to be effectively deployed in novel environments and tasks, t...
research
08/08/2021

Meta-Reinforcement Learning in Broad and Non-Parametric Environments

Recent state-of-the-art artificial agents lack the ability to adapt rapi...
research
06/06/2023

Zero-shot Preference Learning for Offline RL via Optimal Transport

Preference-based Reinforcement Learning (PbRL) has demonstrated remarkab...
research
06/27/2022

Prompting Decision Transformer for Few-Shot Policy Generalization

Humans can leverage prior experience and learn novel tasks from a handfu...
research
05/27/2023

Query-Policy Misalignment in Preference-Based Reinforcement Learning

Preference-based reinforcement learning (PbRL) provides a natural way to...

Please sign up or login with your details

Forgot password? Click here to reset