IL-flOw: Imitation Learning from Observation using Normalizing Flows

05/19/2022
by   Wei-Di Chang, et al.
0

We present an algorithm for Inverse Reinforcement Learning (IRL) from expert state observations only. Our approach decouples reward modelling from policy learning, unlike state-of-the-art adversarial methods which require updating the reward model during policy search and are known to be unstable and difficult to optimize. Our method, IL-flOw, recovers the expert policy by modelling state-state transitions, by generating rewards using deep density estimators trained on the demonstration trajectories, avoiding the instability issues of adversarial methods. We demonstrate that using the state transition log-probability density as a reward signal for forward reinforcement learning translates to matching the trajectory distribution of the expert demonstrations, and experimentally show good recovery of the true reward signal as well as state of the art results for imitation from observation on locomotion and robotic continuous control tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/09/2022

Imitation Learning by State-Only Distribution Matching

Imitation Learning from observation describes policy learning in a simil...
research
08/24/2023

Conditional Kernel Imitation Learning for Continuous State Environments

Imitation Learning (IL) is an important paradigm within the broader rein...
research
09/24/2019

Avoidance Learning Using Observational Reinforcement Learning

Imitation learning seeks to learn an expert policy from sampled demonstr...
research
12/31/2019

Reward-Conditioned Policies

Reinforcement learning offers the promise of automating the acquisition ...
research
11/02/2020

Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations using Generative Models

The potential benefits of model-free reinforcement learning to real robo...
research
03/03/2023

Learning Stabilization Control from Observations by Learning Lyapunov-like Proxy Models

The deployment of Reinforcement Learning to robotics applications faces ...
research
04/25/2022

Imitation Learning from Observations under Transition Model Disparity

Learning to perform tasks by leveraging a dataset of expert observations...

Please sign up or login with your details

Forgot password? Click here to reset