Revisiting Design Choices in Model-Based Offline Reinforcement Learning

by   Cong Lu, et al.

Offline reinforcement learning enables agents to leverage large pre-collected datasets of environment transitions to learn control policies, circumventing the need for potentially expensive or unsafe online data collection. Significant progress has been made recently in offline model-based reinforcement learning, approaches which leverage a learned dynamics model. This typically involves constructing a probabilistic model, and using the model uncertainty to penalize rewards where there is insufficient data, solving for a pessimistic MDP that lower bounds the true MDP. Existing methods, however, exhibit a breakdown between theory and practice, whereby pessimistic return ought to be bounded by the total variation distance of the model from the true dynamics, but is instead implemented through a penalty based on estimated model uncertainty. This has spawned a variety of uncertainty heuristics, with little to no comparison between differing approaches. In this paper, we compare these heuristics, and design novel protocols to investigate their interaction with other hyperparameters, such as the number of models, or imaginary rollout horizon. Using these insights, we show that selecting these key hyperparameters using Bayesian Optimization produces superior configurations that are vastly different to those currently used in existing hand-tuned state-of-the-art methods, and result in drastically stronger performance.


page 16

page 18

page 19

page 20

page 21


MOPO: Model-based Offline Policy Optimization

Offline reinforcement learning (RL) refers to the problem of learning po...

Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations

Offline reinforcement learning has shown great promise in leveraging lar...

Offline Reinforcement Learning for Road Traffic Control

Traffic signal control is an important problem in urban mobility with a ...

A Unified Framework for Alternating Offline Model Training and Policy Learning

In offline model-based reinforcement learning (offline MBRL), we learn a...

GELATO: Geometrically Enriched Latent Model for Offline Reinforcement Learning

Offline reinforcement learning approaches can generally be divided to pr...

A Reinforcement Learning Environment for Polyhedral Optimizations

The polyhedral model allows a structured way of defining semantics-prese...

Goal Recognition as Reinforcement Learning

Most approaches for goal recognition rely on specifications of the possi...

Please sign up or login with your details

Forgot password? Click here to reset