Reinforcement Learning with Sparse Rewards using Guidance from Offline Demonstration

02/09/2022
by   Desik Rengarajan, et al.
17

A major challenge in real-world reinforcement learning (RL) is the sparsity of reward feedback. Often, what is available is an intuitive but sparse reward function that only indicates whether the task is completed partially or fully. However, the lack of carefully designed, fine grain feedback implies that most existing RL algorithms fail to learn an acceptable policy in a reasonable time frame. This is because of the large number of exploration actions that the policy has to perform before it gets any useful feedback that it can learn from. In this work, we address this challenging problem by developing an algorithm that exploits the offline demonstration data generated by a sub-optimal behavior policy for faster and efficient online RL in such sparse reward settings. The proposed algorithm, which we call the Learning Online with Guidance Offline (LOGO) algorithm, merges a policy improvement step with an additional policy guidance step by using the offline demonstration data. The key idea is that by obtaining guidance from - not imitating - the offline data, LOGO orients its policy in the manner of the sub-optimal policy, while yet being able to learn beyond and approach optimality. We provide a theoretical analysis of our algorithm, and provide a lower bound on the performance improvement in each learning episode. We also extend our algorithm to the even more challenging incomplete observation setting, where the demonstration data contains only a censored version of the true state observation. We demonstrate the superior performance of our algorithm over state-of-the-art approaches on a number of benchmark environments with sparse rewards and censored state. Further, we demonstrate the value of our approach via implementing LOGO on a mobile robot for trajectory tracking and obstacle avoidance, where it shows excellent performance.

READ FULL TEXT
research
09/26/2022

Enhanced Meta Reinforcement Learning using Demonstrations in Sparse Reward Environments

Meta reinforcement learning (Meta-RL) is an approach wherein the experie...
research
06/23/2023

CLUE: Calibrated Latent Guidance for Offline Reinforcement Learning

Offline reinforcement learning (RL) aims to learn an optimal policy from...
research
03/24/2023

Optimal Transport for Offline Imitation Learning

With the advent of large datasets, offline reinforcement learning (RL) i...
research
05/24/2023

Provable Offline Reinforcement Learning with Human Feedback

In this paper, we investigate the problem of offline reinforcement learn...
research
04/18/2019

Improving Interactive Reinforcement Agent Planning with Human Demonstration

TAMER has proven to be a powerful interactive reinforcement learning met...
research
10/01/2019

Accelerated Robot Learning via Human Brain Signals

In reinforcement learning (RL), sparse rewards are a natural way to spec...
research
07/18/2018

Backplay: "Man muss immer umkehren"

A long-standing problem in model free reinforcement learning (RL) is tha...

Please sign up or login with your details

Forgot password? Click here to reset