Meta-Q-Learning

09/30/2019
by   Rasool Fakoor, et al.
26

This paper introduces Meta-Q-Learning (MQL), a new off-policy algorithm for meta-Reinforcement Learning (meta-RL). MQL builds upon three simple ideas. First, we show that Q-learning is competitive with state of the art meta-RL algorithms if given access to a context variable that is a representation of the past trajectory. Second, using a multi-task objective to maximize the average reward across the training tasks is an effective method to meta-train RL policies. Third, past data from the meta-training replay buffer can be recycled to adapt the policy on a new task using off-policy updates. MQL draws upon ideas in propensity estimation to do so and thereby amplifies the amount of available data for adaptation. Experiments on standard continuous-control benchmarks suggest that MQL compares favorably with state of the art meta-RL algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

09/30/2019

Efficient meta reinforcement learning via meta goal generation

Meta reinforcement learning (meta-RL) is able to accelerate the acquisit...
01/12/2021

Linear Representation Meta-Reinforcement Learning for Instant Adaptation

This paper introduces Fast Linearized Adaptive Policy (FLAP), a new meta...
10/30/2021

Context Meta-Reinforcement Learning via Neuromodulation

Meta-reinforcement learning (meta-RL) algorithms enable agents to adapt ...
01/06/2021

Off-Policy Meta-Reinforcement Learning Based on Feature Embedding Spaces

Meta-reinforcement learning (RL) addresses the problem of sample ineffic...
10/22/2019

Bottom-Up Meta-Policy Search

Despite of the recent progress in agents that learn through interaction,...
11/17/2016

Learning to reinforcement learn

In recent years deep reinforcement learning (RL) systems have attained s...
01/30/2017

Reinforcement Learning Algorithm Selection

This paper formalises the problem of online algorithm selection in the c...

Code Repositories

meta-q-learning

Code for the paper "Meta-Q-Learning"( ICLR 2020)


view repo