I Know What You Meant: Learning Human Objectives by (Under)estimating Their Choice Set

11/11/2020
by   Ananth Jonnavittula, et al.
0

Assistive robots have the potential to help people perform everyday tasks. However, these robots first need to learn what it is their user wants them to do. Teaching assistive robots is hard for inexperienced users, elderly users, and users living with physical disabilities, since often these individuals are unable to teleoperate the robot along their desired behavior. We know that inclusive learners should give human teachers credit for what they cannot demonstrate. But today's robots do the opposite: they assume every user is capable of providing any demonstration. As a result, these robots learn to mimic the demonstrated behavior, even when that behavior isn't what the human really meant! We propose an alternate approach to reward learning: robots that reason about the user's demonstrations in the context of similar or simpler alternatives. Unlike prior works – which err towards overestimating the human's capabilities – here we err towards underestimating what the human can input (i.e., their choice set). Our theoretical analysis proves that underestimating the human's choice set is risk-averse, with better worst-case performance than overestimating. We formalize three properties to generate similar and simpler alternatives: across simulations and a user study, our algorithm better enables robots to extrapolate the human's objective. See our user study here: https://youtu.be/RgbH2YULVRo

READ FULL TEXT
research
07/20/2021

Learning to Share Autonomy Across Repeated Interaction

Wheelchair-mounted robotic arms (and other assistive robots) should help...
research
02/22/2022

Communicating Robot Conventions through Shared Autonomy

When humans control robot arms these robots often need to infer the huma...
research
03/23/2022

RILI: Robustly Influencing Latent Intent

When robots interact with human partners, often these partners change th...
research
01/02/2023

SIRL: Similarity-based Implicit Representation Learning

When robots learn reward functions using high capacity models that take ...
research
01/06/2021

Optimal Action-based or User Prediction-based Haptic Guidance: Can You Do Even Better?

The recently advanced robotics technology enables robots to assist users...
research
08/19/2023

StROL: Stabilized and Robust Online Learning from Humans

Today's robots can learn the human's reward function online, during the ...
research
06/23/2023

AR2-D2:Training a Robot Without a Robot

Diligently gathered human demonstrations serve as the unsung heroes empo...

Please sign up or login with your details

Forgot password? Click here to reset