Transferable Reward Learning by Dynamics-Agnostic Discriminator Ensemble

06/01/2022
by   Fan-Ming Luo, et al.
0

Inverse reinforcement learning (IRL) recovers the underlying reward function from expert demonstrations. A generalizable reward function is even desired as it captures the fundamental motivation of the expert. However, classical IRL methods can only recover reward functions coupled with the training dynamics, thus are hard to generalize to a changed environment. Previous dynamics-agnostic reward learning methods have strict assumptions, such as that the reward function has to be state-only. This work proposes a general approach to learn transferable reward functions, Dynamics-Agnostic Discriminator-Ensemble Reward Learning (DARL). Following the adversarial imitation learning (AIL) framework, DARL learns a dynamics-agnostic discriminator on a latent space mapped from the original state-action space. The latent space is learned to contain the least information of the dynamics. Moreover, to reduce the reliance of the discriminator on policies, the reward function is represented as an ensemble of the discriminators during training. We assess DARL in four MuJoCo tasks with dynamics transfer. Empirical results compared with the state-of-the-art AIL methods show that DARL can learn a reward that is more consistent with the true reward, thus obtaining higher environment returns.

READ FULL TEXT
research
06/02/2023

PAGAR: Imitation Learning with Protagonist Antagonist Guided Adversarial Reward

Imitation learning (IL) algorithms often rely on inverse reinforcement l...
research
04/14/2021

Reward function shape exploration in adversarial imitation learning: an empirical study

For adversarial imitation learning algorithms (AILs), no true rewards ar...
research
04/27/2018

Decoupling Dynamics and Reward for Transfer Learning

Current reinforcement learning (RL) methods can successfully learn singl...
research
11/17/2020

Efficient Exploration of Reward Functions in Inverse Reinforcement Learning via Bayesian Optimization

The problem of inverse reinforcement learning (IRL) is relevant to a var...
research
10/24/2018

Inverse reinforcement learning for video games

Deep reinforcement learning achieves superhuman performance in a range o...
research
04/24/2018

No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling

Though impressive results have been achieved in visual captioning, the t...
research
03/01/2023

LS-IQ: Implicit Reward Regularization for Inverse Reinforcement Learning

Recent methods for imitation learning directly learn a Q-function using ...

Please sign up or login with your details

Forgot password? Click here to reset