Imitating Past Successes can be Very Suboptimal

06/07/2022
by   Benjamin Eysenbach, et al.
0

Prior work has proposed a simple strategy for reinforcement learning (RL): label experience with the outcomes achieved in that experience, and then imitate the relabeled experience. These outcome-conditioned imitation learning methods are appealing because of their simplicity, strong performance, and close ties with supervised learning. However, it remains unclear how these methods relate to the standard RL objective, reward maximization. In this paper, we prove that existing outcome-conditioned imitation learning methods do not necessarily improve the policy; rather, in some settings they can decrease the expected reward. Nonetheless, we show that a simple modification results in a method that does guarantee policy improvement, under some assumptions. Our aim is not to develop an entirely new method, but rather to explain how a variant of outcome-conditioned imitation learning can be used to maximize rewards.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/16/2023

Imitation from Arbitrary Experience: A Dual Unification of Reinforcement and Imitation Learning Methods

It is well known that Reinforcement Learning (RL) can be formulated as a...
research
02/24/2022

All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RL

Upside down reinforcement learning (UDRL) flips the conventional use of ...
research
09/26/2022

Understanding Hindsight Goal Relabeling Requires Rethinking Divergence Minimization

Hindsight goal relabeling has become a foundational technique for multi-...
research
09/25/2022

Unsupervised Reward Shaping for a Robotic Sequential Picking Task from Visual Observations in a Logistics Scenario

We focus on an unloading problem, typical of the logistics sector, model...
research
12/10/2019

Imitation Learning via Off-Policy Distribution Matching

When performing imitation learning from expert demonstrations, distribut...
research
02/10/2021

Learning Equational Theorem Proving

We develop Stratified Shortest Solution Imitation Learning (3SIL) to lea...
research
09/16/2018

Deep Learning with Experience Ranking Convolutional Neural Network for Robot Manipulator

Supervised learning, more specifically Convolutional Neural Networks (CN...

Please sign up or login with your details

Forgot password? Click here to reset