All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RL

02/24/2022
by   Kai Arulkumaran, et al.
0

Upside down reinforcement learning (UDRL) flips the conventional use of the return in the objective function in RL upside down, by taking returns as input and predicting actions. UDRL is based purely on supervised learning, and bypasses some prominent issues in RL: bootstrapping, off-policy corrections, and discount factors. While previous work with UDRL demonstrated it in a traditional online RL setting, here we show that this single algorithm can also work in the imitation learning and offline RL settings, be extended to the goal-conditioned RL setting, and even the meta-RL setting. With a general agent architecture, a single UDRL agent can learn across all paradigms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/22/2020

Self-Imitation Advantage Learning

Self-imitation learning is a Reinforcement Learning (RL) method that enc...
research
08/06/2020

Offline Meta Reinforcement Learning

Consider the following problem, which we term Offline Meta Reinforcement...
research
06/07/2022

Imitating Past Successes can be Very Suboptimal

Prior work has proposed a simple strategy for reinforcement learning (RL...
research
08/28/2019

An Empirical Comparison on Imitation Learning and Reinforcement Learning for Paraphrase Generation

Generating paraphrases from given sentences involves decoding words step...
research
06/02/2022

When does return-conditioned supervised learning work for offline reinforcement learning?

Several recent works have proposed a class of algorithms for the offline...
research
05/30/2019

On Value Functions and the Agent-Environment Boundary

When function approximation is deployed in reinforcement learning (RL), ...
research
12/30/2022

Bayesian Learning for Dynamic Inference

The traditional statistical inference is static, in the sense that the e...

Please sign up or login with your details

Forgot password? Click here to reset