MOReL : Model-Based Offline Reinforcement Learning

05/12/2020
by   Rahul Kidambi, et al.
9

In offline reinforcement learning (RL), the goal is to learn a successful policy using only a dataset of historical interactions with the environment, without any additional online interactions. This serves as an extreme test for an agent's ability to effectively use historical data, which is critical for efficient RL. Prior work in offline RL has been confined almost exclusively to model-free RL approaches. In this work, we present MOReL, an algorithmic framework for model-based RL in the offline setting. This framework consists of two steps: (a) learning a pessimistic MDP model using the offline dataset; (b) learning a near-optimal policy in the learned pessimistic MDP. The construction of the pessimistic MDP is such that for any policy, the performance in the real environment is lower bounded by the performance in the pessimistic MDP. This enables the pessimistic MDP to serve as a good surrogate for the purposes of policy evaluation and learning. Overall, MOReL is amenable to detailed theoretical analysis, enables easy and transparent design of practical algorithms, and leads to state-of-the-art results on widely studied offline RL benchmark tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2020

MOPO: Model-based Offline Policy Optimization

Offline reinforcement learning (RL) refers to the problem of learning po...
research
04/13/2021

Online and Offline Reinforcement Learning by Planning with a Learned Model

Learning efficiently from small amounts of data has long been the focus ...
research
12/21/2020

Offline Reinforcement Learning from Images with Latent Space Models

Offline reinforcement learning (RL) refers to the problem of learning po...
research
08/23/2022

Strategic Decision-Making in the Presence of Information Asymmetry: Provably Efficient RL with Algorithmic Instruments

We study offline reinforcement learning under a novel model called strat...
research
06/13/2023

Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective

Off-policy Learning to Rank (LTR) aims to optimize a ranker from data co...
research
01/07/2022

Offline Reinforcement Learning for Road Traffic Control

Traffic signal control is an important problem in urban mobility with a ...
research
02/13/2021

PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators

We consider offline reinforcement learning (RL) with heterogeneous agent...

Please sign up or login with your details

Forgot password? Click here to reset