Reward Learning with Intractable Normalizing Functions

05/16/2023
by   Joshua Hoegerman, et al.
0

Robots can learn to imitate humans by inferring what the human is optimizing for. One common framework for this is Bayesian reward learning, where the robot treats the human's demonstrations and corrections as observations of their underlying reward function. Unfortunately, this inference is doubly-intractable: the robot must reason over all the trajectories the person could have provided and all the rewards the person could have in mind. Prior work uses existing robotic tools to approximate this normalizer. In this paper, we group previous approaches into three fundamental classes and analyze the theoretical pros and cons of their approach. We then leverage recent research from the statistics community to introduce Double MH reward learning, a Monte Carlo method for asymptotically learning the human's reward in continuous spaces. We extend Double MH to conditionally independent settings (where each human correction is viewed as completely separate) and conditionally dependent environments (where the human's current correction may build on previous inputs). Across simulations and user studies, our proposed approach infers the human's reward parameters more accurately than the alternate approximations when learning from either demonstrations or corrections. See videos here: https://youtu.be/EkmT3o5K5ko

READ FULL TEXT

page 1

page 7

page 8

research
01/13/2020

LESS is More: Rethinking Probabilistic Models of Human Behavior

Robots need models of human behavior for both inferring human goals and ...
research
10/19/2022

Learning Preferences for Interactive Autonomy

When robots enter everyday human environments, they need to understand t...
research
08/19/2023

StROL: Stabilized and Robust Online Learning from Humans

Today's robots can learn the human's reward function online, during the ...
research
01/19/2021

Choice Set Misspecification in Reward Inference

Specifying reward functions for robots that operate in environments with...
research
06/23/2020

Feature Expansive Reward Learning: Rethinking Human Input

In collaborative human-robot scenarios, when a person is not satisfied w...
research
10/10/2019

Asking Easy Questions: A User-Friendly Approach to Active Reward Learning

Robots can learn the right reward function by querying a human expert. E...
research
12/09/2022

On the Sensitivity of Reward Inference to Misspecified Human Models

Inferring reward functions from human behavior is at the center of value...

Please sign up or login with your details

Forgot password? Click here to reset