Imitation from Arbitrary Experience: A Dual Unification of Reinforcement and Imitation Learning Methods

02/16/2023
by   Harshit Sikchi, et al.
0

It is well known that Reinforcement Learning (RL) can be formulated as a convex program with linear constraints. The dual form of this formulation is unconstrained, which we refer to as dual RL, and can leverage preexisting tools from convex optimization to improve the learning performance of RL agents. We show that several state-of-the-art deep RL algorithms (in online, offline, and imitation settings) can be viewed as dual RL approaches in a unified framework. This unification calls for the methods to be studied on common ground, so as to identify the components that actually contribute to the success of these methods. Our unification also reveals that prior off-policy imitation learning methods in the dual space are based on an unrealistic coverage assumption and are restricted to matching a particular f-divergence. We propose a new method using a simple modification to the dual framework that allows for imitation learning with arbitrary off-policy data to obtain near-expert performance.

READ FULL TEXT
research
07/08/2019

On-Policy Robot Imitation Learning from a Converging Supervisor

Existing on-policy imitation learning algorithms, such as DAgger, assume...
research
06/07/2022

Imitating Past Successes can be Very Suboptimal

Prior work has proposed a simple strategy for reinforcement learning (RL...
research
09/22/2022

Proximal Point Imitation Learning

This work develops new algorithms with rigorous efficiency guarantees fo...
research
02/03/2022

Challenging Common Assumptions in Convex Reinforcement Learning

The classic Reinforcement Learning (RL) formulation concerns the maximiz...
research
05/26/2018

Fast Policy Learning through Imitation and Reinforcement

Imitation learning (IL) consists of a set of tools that leverage expert ...
research
02/04/2022

SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching

We propose State Matching Offline DIstribution Correction Estimation (SM...
research
08/28/2019

An Empirical Comparison on Imitation Learning and Reinforcement Learning for Paraphrase Generation

Generating paraphrases from given sentences involves decoding words step...

Please sign up or login with your details

Forgot password? Click here to reset