Goal-conditioned Imitation Learning

06/13/2019
by   Yiming Ding, et al.
11

Designing rewards for Reinforcement Learning (RL) is challenging because it needs to convey the desired task, be efficient to optimize, and be easy to compute. The latter is particularly problematic when applying RL to robotics, where detecting whether the desired configuration is reached might require considerable supervision and instrumentation. Furthermore, we are often interested in being able to reach a wide range of configurations, hence setting up a different reward every time might be unpractical. Methods like Hindsight Experience Replay (HER) have recently shown promise to learn policies able to reach many goals, without the need of a reward. Unfortunately, without tricks like resetting to points along the trajectory, HER might take a very long time to discover how to reach certain areas of the state-space. In this work we investigate different approaches to incorporate demonstrations to drastically speed up the convergence to a policy able to reach any goal, also surpassing the performance of an agent trained with other Imitation Learning algorithms. Furthermore, our method can be used when only trajectories without expert actions are available, which can leverage kinestetic or third person demonstration. The code is available at https://sites.google.com/view/goalconditioned-il/ .

READ FULL TEXT

page 4

page 6

page 8

research
12/12/2019

Learning To Reach Goals Without Reinforcement Learning

Imitation learning algorithms provide a simple and straightforward appro...
research
03/06/2017

Third-Person Imitation Learning

Reinforcement learning (RL) makes it possible to train agents capable of...
research
02/15/2020

Universal Value Density Estimation for Imitation Learning and Goal-Conditioned Reinforcement Learning

This work considers two distinct settings: imitation learning and goal-c...
research
09/26/2022

Understanding Hindsight Goal Relabeling Requires Rethinking Divergence Minimization

Hindsight goal relabeling has become a foundational technique for multi-...
research
09/19/2023

Guide Your Agent with Adaptive Multimodal Rewards

Developing an agent capable of adapting to unseen environments remains a...
research
09/25/2022

Unsupervised Reward Shaping for a Robotic Sequential Picking Task from Visual Observations in a Logistics Scenario

We focus on an unloading problem, typical of the logistics sector, model...
research
09/05/2023

A Survey of Imitation Learning: Algorithms, Recent Developments, and Challenges

In recent years, the development of robotics and artificial intelligence...

Please sign up or login with your details

Forgot password? Click here to reset