Policy Continuation with Hindsight Inverse Dynamics

10/30/2019
by   Hao Sun, et al.
8

Solving goal-oriented tasks is an important but challenging problem in reinforcement learning (RL). For such tasks, the rewards are often sparse, making it difficult to learn a policy effectively. To tackle this difficulty, we propose a new approach called Policy Continuation with Hindsight Inverse Dynamics (PCHID). This approach learns from Hindsight Inverse Dynamics based on Hindsight Experience Replay. Enabling the learning process in a self-imitated manner and thus can be trained with supervised learning. This work also extends it to multi-step settings with Policy Continuation. The proposed method is general, which can work in isolation or be combined with other on-policy and off-policy algorithms. On two multi-goal tasks GridWorld and FetchReach, PCHID significantly improves the sample efficiency as well as the final performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/01/2021

MHER: Model-based Hindsight Experience Replay

Solving multi-goal reinforcement learning (RL) problems with sparse rewa...
research
02/25/2020

Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement

Multi-task reinforcement learning (RL) aims to simultaneously learn poli...
research
08/28/2020

Sample Efficiency in Sparse Reinforcement Learning: Or Your Money Back

Sparse rewards present a difficult problem in reinforcement learning and...
research
07/06/2020

Counterfactual Data Augmentation using Locally Factored Dynamics

Many dynamic processes, including common scenarios in robotic control an...
research
06/02/2022

Uniqueness and Complexity of Inverse MDP Models

What is the action sequence aa'a" that was likely responsible for reachi...
research
03/03/2023

Hindsight States: Blending Sim and Real Task Elements for Efficient Reinforcement Learning

Reinforcement learning has shown great potential in solving complex task...
research
04/27/2020

Evolutionary Stochastic Policy Distillation

Solving the Goal-Conditioned Reward Sparse (GCRS) task is a challenging ...

Please sign up or login with your details

Forgot password? Click here to reset