BC-IRL: Learning Generalizable Reward Functions from Demonstrations

03/28/2023
by   Andrew Szot, et al.
0

How well do reward functions learned with inverse reinforcement learning (IRL) generalize? We illustrate that state-of-the-art IRL algorithms, which maximize a maximum-entropy objective, learn rewards that overfit to the demonstrations. Such rewards struggle to provide meaningful rewards for states not covered by the demonstrations, a major detriment when using the reward to learn policies in new situations. We introduce BC-IRL a new inverse reinforcement learning method that learns reward functions that generalize better when compared to maximum-entropy IRL approaches. In contrast to the MaxEnt framework, which learns to maximize rewards around demonstrations, BC-IRL updates reward parameters such that the policy trained with the new reward matches the expert demonstrations better. We show that BC-IRL learns rewards that generalize better on an illustrative simple task and two continuous robotic control tasks, achieving over twice the success rate of baselines in challenging generalization settings.

READ FULL TEXT

page 1

page 6

page 7

page 13

page 14

research
02/20/2020

oIRL: Robust Adversarial Inverse Reinforcement Learning with Temporally Extended Actions

Explicit engineering of reward functions for given environments has been...
research
02/15/2021

Learning from Demonstrations using Signal Temporal Logic

Learning-from-demonstrations is an emerging paradigm to obtain effective...
research
02/24/2021

PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning

We study reinforcement learning (RL) with no-reward demonstrations, a se...
research
05/23/2019

Inverse Reinforcement Learning in Contextual MDPs

We consider the Inverse Reinforcement Learning (IRL) problem in Contextu...
research
07/14/2021

Deep Adaptive Multi-Intention Inverse Reinforcement Learning

This paper presents a deep Inverse Reinforcement Learning (IRL) framewor...
research
09/29/2020

Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution

Reinforcement Learning algorithms require a large number of samples to s...
research
06/21/2023

Inverse Constraint Learning and Generalization by Transferable Reward Decomposition

We present the problem of inverse constraint learning (ICL), which recov...

Please sign up or login with your details

Forgot password? Click here to reset