Online and Offline Reinforcement Learning by Planning with a Learned Model

04/13/2021
by   Julian Schrittwieser, et al.
43

Learning efficiently from small amounts of data has long been the focus of model-based reinforcement learning, both for the online case when interacting with the environment and the offline case when learning from a fixed dataset. However, to date no single unified algorithm could demonstrate state-of-the-art results in both settings. In this work, we describe the Reanalyse algorithm which uses model-based policy and value improvement operators to compute new improved training targets on existing data points, allowing efficient learning for data budgets varying by several orders of magnitude. We further show that Reanalyse can also be used to learn entirely from demonstrations without any environment interactions, as in the case of offline Reinforcement Learning (offline RL). Combining Reanalyse with the MuZero algorithm, we introduce MuZero Unplugged, a single unified algorithm for any data budget, including offline RL. In contrast to previous work, our algorithm does not require any special adaptations for the off-policy or offline RL settings. MuZero Unplugged sets new state-of-the-art results in the RL Unplugged offline RL benchmark as well as in the online RL benchmark of Atari in the standard 200 million frame setting.

READ FULL TEXT
research
05/12/2020

MOReL : Model-Based Offline Reinforcement Learning

In offline reinforcement learning (RL), the goal is to learn a successfu...
research
02/07/2019

Deeper & Sparser Exploration

We address the problem of efficient exploration by proposing a new meta ...
research
10/12/2022

Efficient Offline Policy Optimization with a Learned Model

MuZero Unplugged presents a promising approach for offline policy learni...
research
10/26/2020

OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning

Reinforcement learning (RL) has achieved impressive performance in a var...
research
09/06/2023

RLSynC: Offline-Online Reinforcement Learning for Synthon Completion

Retrosynthesis is the process of determining the set of reactant molecul...
research
12/20/2021

RvS: What is Essential for Offline RL via Supervised Learning?

Recent work has shown that supervised learning alone, without temporal d...
research
11/27/2022

Domain Generalization for Robust Model-Based Offline Reinforcement Learning

Existing offline reinforcement learning (RL) algorithms typically assume...

Please sign up or login with your details

Forgot password? Click here to reset