Off-Policy Meta-Reinforcement Learning Based on Feature Embedding Spaces

01/06/2021
by   Takahisa Imagawa, et al.
8

Meta-reinforcement learning (RL) addresses the problem of sample inefficiency in deep RL by using experience obtained in past tasks for a new task to be solved. However, most meta-RL methods require partially or fully on-policy data, i.e., they cannot reuse the data collected by past policies, which hinders the improvement of sample efficiency. To alleviate this problem, we propose a novel off-policy meta-RL method, embedding learning and evaluation of uncertainty (ELUE). An ELUE agent is characterized by the learning of a feature embedding space shared among tasks. It learns a belief model over the embedding space and a belief-conditional policy and Q-function. Then, for a new task, it collects data by the pretrained policy, and updates its belief based on the belief model. Thanks to the belief update, the performance can be improved with a small amount of data. In addition, it updates the parameters of the neural networks to adjust the pretrained relationships when there are enough data. We demonstrate that ELUE outperforms state-of-the-art meta RL methods through experiments on meta-RL benchmarks.

READ FULL TEXT

page 6

page 7

page 9

page 13

research
09/30/2019

Meta-Q-Learning

This paper introduces Meta-Q-Learning (MQL), a new off-policy algorithm ...
research
05/16/2019

Meta Reinforcement Learning with Task Embedding and Shared Policy

Despite significant progress, deep reinforcement learning (RL) suffers f...
research
05/14/2021

Estimating Disentangled Belief about Hidden State and Hidden Task for Meta-RL

There is considerable interest in designing meta-reinforcement learning ...
research
04/30/2020

Plan-Space State Embeddings for Improved Reinforcement Learning

Robot control problems are often structured with a policy function that ...
research
10/20/2022

Hypernetworks in Meta-Reinforcement Learning

Training a reinforcement learning (RL) agent on a real-world robotics ta...
research
06/04/2023

ContraBAR: Contrastive Bayes-Adaptive Deep RL

In meta reinforcement learning (meta RL), an agent seeks a Bayes-optimal...
research
01/30/2023

Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs

Aiming to produce reinforcement learning (RL) policies that are human-in...

Please sign up or login with your details

Forgot password? Click here to reset