Enhanced Meta Reinforcement Learning using Demonstrations in Sparse Reward Environments

09/26/2022
by   Desik Rengarajan, et al.
0

Meta reinforcement learning (Meta-RL) is an approach wherein the experience gained from solving a variety of tasks is distilled into a meta-policy. The meta-policy, when adapted over only a small (or just a single) number of steps, is able to perform near-optimally on a new, related task. However, a major challenge to adopting this approach to solve real-world problems is that they are often associated with sparse reward functions that only indicate whether a task is completed partially or fully. We consider the situation where some data, possibly generated by a sub-optimal agent, is available for each task. We then develop a class of algorithms entitled Enhanced Meta-RL using Demonstrations (EMRLD) that exploit this information even if sub-optimal to obtain guidance during training. We show how EMRLD jointly utilizes RL and supervised learning over the offline data to generate a meta-policy that demonstrates monotone performance improvements. We also develop a warm started variant called EMRLD-WS that is particularly efficient for sub-optimal demonstration data. Finally, we show that our EMRLD algorithms significantly outperform existing approaches in a variety of sparse reward environments, including that of a mobile robot.

READ FULL TEXT

page 8

page 9

research
02/09/2022

Reinforcement Learning with Sparse Rewards using Guidance from Offline Demonstration

A major challenge in real-world reinforcement learning (RL) is the spars...
research
12/02/2021

Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL

Meta-reinforcement learning (meta-RL) has proven to be a successful fram...
research
02/11/2020

Hyper-Meta Reinforcement Learning with Sparse Reward

Despite their success, existing meta reinforcement learning methods stil...
research
12/16/2020

Learning to Run with Potential-Based Reward Shaping and Demonstrations from Video Data

Learning to produce efficient movement behaviour for humanoid robots fro...
research
02/04/2020

Learning robotic ultrasound scanning using probabilistic temporal ranking

This paper addresses a common class of problems where a robot learns to ...
research
05/23/2022

Efficient Reinforcement Learning from Demonstration Using Local Ensemble and Reparameterization with Split and Merge of Expert Policies

The current work on reinforcement learning (RL) from demonstrations ofte...
research
08/19/2021

Prior Is All You Need to Improve the Robustness and Safety for the First Time Deployment of Meta RL

The field of Meta Reinforcement Learning (Meta-RL) has seen substantial ...

Please sign up or login with your details

Forgot password? Click here to reset