Inverse Reinforcement Learning with Multiple Ranked Experts

07/31/2019
by   Pablo Samuel Castro, et al.
5

We consider the problem of learning to behave optimally in a Markov Decision Process when a reward function is not specified, but instead we have access to a set of demonstrators of varying performance. We assume the demonstrators are classified into one of k ranks, and use ideas from ordinal regression to find a reward function that maximizes the margin between the different ranks. This approach is based on the idea that agents should not only learn how to behave from experts, but also how not to behave from non-experts. We show there are MDPs where important differences in the reward function would be hidden from existing algorithms by the behaviour of the expert. Our method is particularly useful for problems where we have access to a large set of agent behaviours with varying degrees of expertise (such as through GPS or cellphones). We highlight the differences between our approach and existing methods using a simple grid domain and demonstrate its efficacy on determining passenger-finding strategies for taxi drivers, using a large dataset of GPS trajectories.

READ FULL TEXT

page 6

page 7

research
09/22/2022

Identifiability and generalizability from multiple experts in Inverse Reinforcement Learning

While Reinforcement Learning (RL) aims to train an agent from a reward f...
research
05/25/2021

Trajectory Modeling via Random Utility Inverse Reinforcement Learning

We consider the problem of modeling trajectories of drivers in a road ne...
research
12/15/2017

Inverse Reinforce Learning with Nonparametric Behavior Clustering

Inverse Reinforcement Learning (IRL) is the task of learning a single re...
research
05/14/2023

Inverse Reinforcement Learning With Constraint Recovery

In this work, we propose a novel inverse reinforcement learning (IRL) al...
research
03/22/2023

Communication Load Balancing via Efficient Inverse Reinforcement Learning

Communication load balancing aims to balance the load between different ...
research
09/27/2021

Learning Multimodal Rewards from Rankings

Learning from human feedback has shown to be a useful approach in acquir...
research
12/04/2017

Inferring agent objectives at different scales of a complex adaptive system

We introduce a framework to study the effective objectives at different ...

Please sign up or login with your details

Forgot password? Click here to reset