Joint Goal and Strategy Inference across Heterogeneous Demonstrators via Reward Network Distillation

01/02/2020
by   Letian Chen, et al.
6

Reinforcement learning (RL) has achieved tremendous success as a general framework for learning how to make decisions. However, this success relies on the interactive hand-tuning of a reward function by RL experts. On the other hand, inverse reinforcement learning (IRL) seeks to learn a reward function from readily-obtained human demonstrations. Yet, IRL suffers from two major limitations: 1) reward ambiguity - there are an infinite number of possible reward functions that could explain an expert's demonstration and 2) heterogeneity - human experts adopt varying strategies and preferences, which makes learning from multiple demonstrators difficult due to the common assumption that demonstrators seeks to maximize the same reward. In this work, we propose a method to jointly infer a task goal and humans' strategic preferences via network distillation. This approach enables us to distill a robust task reward (addressing reward ambiguity) and to model each strategy's objective (handling heterogeneity). We demonstrate our algorithm can better recover task reward and strategy rewards and imitate the strategies in two simulated tasks and a real-world table tennis task.

READ FULL TEXT

page 7

page 8

research
05/31/2018

Learning a Prior over Intent via Meta-Inverse Reinforcement Learning

A significant challenge for the practical application of reinforcement l...
research
09/22/2022

Identifiability and generalizability from multiple experts in Inverse Reinforcement Learning

While Reinforcement Learning (RL) aims to train an agent from a reward f...
research
07/22/2023

DIP-RL: Demonstration-Inferred Preference Learning in Minecraft

In machine learning for sequential decision-making, an algorithmic agent...
research
06/23/2019

On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference

Our goal is for agents to optimize the right reward function, despite ho...
research
03/22/2023

Communication Load Balancing via Efficient Inverse Reinforcement Learning

Communication load balancing aims to balance the load between different ...
research
11/08/2021

Batch Reinforcement Learning from Crowds

A shortcoming of batch reinforcement learning is its requirement for rew...
research
12/06/2022

Misspecification in Inverse Reinforcement Learning

The aim of Inverse Reinforcement Learning (IRL) is to infer a reward fun...

Please sign up or login with your details

Forgot password? Click here to reset