Adversarial Imitation via Variational Inverse Reinforcement Learning

09/17/2018
by   Ahmed H. Qureshi, et al.
2

We consider a problem of learning a reward and policy from expert examples under unknown dynamics in high-dimensional scenarios. Our proposed method builds on the framework of generative adversarial networks and exploits reward shaping to learn near-optimal rewards and policies. Potential-based reward shaping functions are known to guide the learning agent whereas in this paper we bring forward their benefits in learning near-optimal rewards. Our method simultaneously learns a potential-based reward shaping function through variational information maximization along with the reward and policy under the adversarial learning formulation. We evaluate our method on various high-dimensional complex control tasks. We also evaluate our learned rewards in transfer learning problems where training and testing environments are made to be different from each other in terms of dynamics or structure. Our experimentation shows that our proposed method not only learns near-optimal rewards and policies matching expert behavior, but also performs significantly better than state-of-the-art inverse reinforcement learning algorithms.

READ FULL TEXT

page 6

page 7

page 8

page 12

research
02/20/2020

oIRL: Robust Adversarial Inverse Reinforcement Learning with Temporally Extended Actions

Explicit engineering of reward functions for given environments has been...
research
06/23/2021

IQ-Learn: Inverse soft-Q Learning for Imitation

In many sequential decision-making problems (e.g., robotics control, gam...
research
06/19/2022

Learning Multi-Task Transferable Rewards via Variational Inverse Reinforcement Learning

Many robotic tasks are composed of a lot of temporally correlated sub-ta...
research
09/21/2020

Structure-Guided Processing Path Optimization with Deep Reinforcement Learning

A major goal of material design is the inverse optimization of processin...
research
10/04/2022

Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees

Inverse reinforcement learning (IRL) aims to recover the reward function...
research
02/25/2020

G-Learner and GIRL: Goal Based Wealth Management with Reinforcement Learning

We present a reinforcement learning approach to goal based wealth manage...
research
06/07/2021

Reconciling Rewards with Predictive State Representations

Predictive state representations (PSRs) are models of controlled non-Mar...

Please sign up or login with your details

Forgot password? Click here to reset