Hyper-Meta Reinforcement Learning with Sparse Reward

02/11/2020
by   Yun Hua, et al.
9

Despite their success, existing meta reinforcement learning methods still have difficulty in learning a meta policy effectively for RL problems with sparse reward. To this end, we develop a novel meta reinforcement learning framework, Hyper-Meta RL (HMRL), for sparse reward RL problems. It consists of meta state embedding, meta reward shaping and meta policy learning modules: The cross-environment meta state embedding module constructs a common meta state space to adapt to different environments; The meta state based environment-specific meta reward shaping effectively extends the original sparse reward trajectory by cross-environmental knowledge complementarity; As a consequence, the meta policy then achieves better generalization and efficiency with the shaped meta reward. Experiments with sparse reward show the superiority of HMRL on both transferability and policy learning efficiency.

READ FULL TEXT

page 10

page 11

page 12

page 13

research
10/02/2020

Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning

Meta-learning is a powerful tool for learning policies that can adapt ef...
research
12/02/2021

Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL

Meta-reinforcement learning (meta-RL) has proven to be a successful fram...
research
09/10/2020

Importance Weighted Policy Learning and Adaption

The ability to exploit prior experience to solve novel problems rapidly ...
research
09/26/2022

Enhanced Meta Reinforcement Learning using Demonstrations in Sparse Reward Environments

Meta reinforcement learning (Meta-RL) is an approach wherein the experie...
research
06/07/2022

Variational Meta Reinforcement Learning for Social Robotics

With the increasing presence of robots in our every-day environments, im...
research
05/29/2023

Continual Task Allocation in Meta-Policy Network via Sparse Prompting

How to train a generalizable meta-policy by continually learning a seque...
research
09/06/2021

Hindsight Reward Tweaking via Conditional Deep Reinforcement Learning

Designing optimal reward functions has been desired but extremely diffic...

Please sign up or login with your details

Forgot password? Click here to reset