Offline Preference-Based Apprenticeship Learning

07/20/2021
by   Daniel Shin, et al.
0

We study how an offline dataset of prior (possibly random) experience can be used to address two challenges that autonomous systems face when they endeavor to learn from, adapt to, and collaborate with humans : (1) identifying the human's intent and (2) safely optimizing the autonomous system's behavior to achieve this inferred intent. First, we use the offline dataset to efficiently infer the human's reward function via pool-based active preference learning. Second, given this learned reward function, we perform offline reinforcement learning to optimize a policy based on the inferred human intent. Crucially, our proposed approach does not require actual physical rollouts or an accurate simulator for either the reward learning or policy optimization steps, enabling both safe and efficient apprenticeship learning. We identify and evaluate our approach on a subset of existing offline RL benchmarks that are well suited for offline reward learning and also evaluate extensions of these benchmarks which allow more open-ended behaviors. Our experiments show that offline preference-based reward learning followed by offline reinforcement learning enables efficient and high-performing policies, while only requiring small numbers of preference queries. Videos available at https://sites.google.com/view/offline-prefs.

READ FULL TEXT

page 5

page 8

research
01/03/2023

Benchmarks and Algorithms for Offline Preference-Based Reward Learning

Learning a reward function from human preferences is challenging as it t...
research
05/25/2023

Beyond Reward: Offline Preference-guided Policy Optimization

This study focuses on the topic of offline preference-based reinforcemen...
research
05/24/2023

Inverse Preference Learning: Preference-based RL without a Reward Function

Reward functions are difficult to design and often hard to align with hu...
research
10/27/2022

Learning on the Job: Self-Rewarding Offline-to-Online Finetuning for Industrial Insertion of Novel Connectors from Vision

Learning-based methods in robotics hold the promise of generalization, b...
research
05/18/2023

Bayesian Reparameterization of Reward-Conditioned Reinforcement Learning with Energy-based Models

Recently, reward-conditioned reinforcement learning (RCRL) has gained po...
research
01/11/2023

Efficient Preference-Based Reinforcement Learning Using Learned Dynamics Models

Preference-based reinforcement learning (PbRL) can enable robots to lear...
research
04/13/2022

A Study of Causal Confusion in Preference-Based Reward Learning

Learning robot policies via preference-based reward learning is an incre...

Please sign up or login with your details

Forgot password? Click here to reset