Batch Active Preference-Based Learning of Reward Functions

10/10/2018
by   Erdem Bıyık, et al.
0

Data generation and labeling are usually an expensive part of learning for robotics. While active learning methods are commonly used to tackle the former problem, preference-based learning is a concept that attempts to solve the latter by querying users with preference questions. In this paper, we will develop a new algorithm, batch active preference-based learning, that enables efficient learning of reward functions using as few data samples as possible while still having short query generation times. We introduce several approximations to the batch active learning problem, and provide theoretical guarantees for the convergence of our algorithms. Finally, we present our experimental results for a variety of robotics tasks in simulation. Our results suggest that our batch active learning algorithm requires only a few queries that are computed in a short amount of time. We then showcase our algorithm in a study to learn human users' preferences.

READ FULL TEXT

page 5

page 6

research
06/21/2019

Learning Reward Functions by Integrating Human Demonstrations and Preferences

Our goal is to accurately and efficiently learn reward functions for aut...
research
08/16/2021

APReL: A Library for Active Preference-based Reward Learning Algorithms

Reward learning is a fundamental problem in robotics to have robots that...
research
06/19/2019

Batch Active Learning Using Determinantal Point Processes

Data collection and labeling is one of the main challenges in employing ...
research
09/10/2018

Learning Time Dependent Choice

We explore questions dealing with the learnability of models of choice o...
research
06/26/2023

BatchGFN: Generative Flow Networks for Batch Active Learning

We introduce BatchGFN – a novel approach for pool-based active learning ...
research
03/14/2016

Active Algorithms For Preference Learning Problems with Multiple Populations

In this paper we model the problem of learning preferences of a populati...
research
03/13/2020

Human Preference-Based Learning for High-dimensional Optimization of Exoskeleton Walking Gaits

Understanding users' gait preferences of a lower-body exoskeleton requir...

Please sign up or login with your details

Forgot password? Click here to reset