Policy Gradient from Demonstration and Curiosity

04/22/2020
by   Jie Chen, et al.
19

With reinforcement learning, an agent could learn complex behaviors from high-level abstractions of the task. However, exploration and reward shaping remained challenging for existing methods, especially in scenarios where the extrinsic feedback was sparse. Expert demonstrations have been investigated to solve these difficulties, but a tremendous number of high-quality demonstrations were usually required. In this work, an integrated policy gradient algorithm was proposed to boost exploration and facilitate intrinsic reward learning from only limited number of demonstrations. We achieved this by reformulating the original reward function with two additional terms, where the first term measured the Jensen-Shannon divergence between current policy and the expert, and the second term estimated the agent's uncertainty about the environment. The presented algorithm was evaluated on a range of simulated tasks with sparse extrinsic reward signals where only one single demonstrated trajectory was provided to each task, superior exploration efficiency and high average return were demonstrated in all tasks. Furthermore, it was found that the agent could imitate the expert's behavior and meanwhile sustain high return.

READ FULL TEXT
research
12/03/2022

Reinforcement learning with Demonstrations from Mismatched Task under Sparse Reward

Reinforcement learning often suffer from the sparse reward issue in real...
research
06/12/2022

Dealing with Sparse Rewards in Continuous Control Robotics via Heavy-Tailed Policies

In this paper, we present a novel Heavy-Tailed Stochastic Policy Gradien...
research
07/07/2020

Guided Exploration with Proximal Policy Optimization using a Single Demonstration

Solving sparse reward tasks through exploration is one of the major chal...
research
06/14/2023

Curricular Subgoals for Inverse Reinforcement Learning

Inverse Reinforcement Learning (IRL) aims to reconstruct the reward func...
research
04/24/2018

No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling

Though impressive results have been achieved in visual captioning, the t...
research
06/14/2020

Reinforcement Learning with Supervision from Noisy Demonstrations

Reinforcement learning has achieved great success in various application...
research
02/09/2023

CLARE: Conservative Model-Based Reward Learning for Offline Inverse Reinforcement Learning

This work aims to tackle a major challenge in offline Inverse Reinforcem...

Please sign up or login with your details

Forgot password? Click here to reset