Inverse Reinforce Learning with Nonparametric Behavior Clustering

12/15/2017
by   Siddharthan Rajasekaran, et al.
0

Inverse Reinforcement Learning (IRL) is the task of learning a single reward function given a Markov Decision Process (MDP) without defining the reward function, and a set of demonstrations generated by humans/experts. However, in practice, it may be unreasonable to assume that human behaviors can be explained by one reward function since they may be inherently inconsistent. Also, demonstrations may be collected from various users and aggregated to infer and predict user's behaviors. In this paper, we introduce the Non-parametric Behavior Clustering IRL algorithm to simultaneously cluster demonstrations and learn multiple reward functions from demonstrations that may be generated from more than one behaviors. Our method is iterative: It alternates between clustering demonstrations into different behavior clusters and inverse learning the reward functions until convergence. It is built upon the Expectation-Maximization formulation and non-parametric clustering in the IRL setting. Further, to improve the computation efficiency, we remove the need of completely solving multiple IRL problems for multiple clusters during the iteration steps and introduce a resampling technique to avoid generating too many unlikely clusters. We demonstrate the convergence and efficiency of the proposed method through learning multiple driver behaviors from demonstrations generated from a grid-world environment and continuous trajectories collected from autonomous robot cars using the Gazebo robot simulator.

READ FULL TEXT

page 6

page 7

research
06/07/2021

Identifiability in inverse reinforcement learning

Inverse reinforcement learning attempts to reconstruct the reward functi...
research
06/03/2021

LiMIIRL: Lightweight Multiple-Intent Inverse Reinforcement Learning

Multiple-Intent Inverse Reinforcement Learning (MI-IRL) seeks to find a ...
research
09/20/2019

Meta-Inverse Reinforcement Learning with Probabilistic Context Variables

Providing a suitable reward function to reinforcement learning can be di...
research
05/24/2014

Efficient Model Learning for Human-Robot Collaborative Tasks

We present a framework for learning human user models from joint-action ...
research
07/31/2019

Inverse Reinforcement Learning with Multiple Ranked Experts

We consider the problem of learning to behave optimally in a Markov Deci...
research
11/12/2020

Generalized Inverse Planning: Learning Lifted non-Markovian Utility for Generalizable Task Representation

In searching for a generalizable representation of temporally extended t...
research
12/12/2019

Improved Activity Forecasting for Generating Trajectories

An efficient inverse reinforcement learning for generating trajectories ...

Please sign up or login with your details

Forgot password? Click here to reset