Domain Generalization for Robust Model-Based Offline Reinforcement Learning

11/27/2022
by   Alan Clark, et al.
8

Existing offline reinforcement learning (RL) algorithms typically assume that training data is either: 1) generated by a known policy, or 2) of entirely unknown origin. We consider multi-demonstrator offline RL, a middle ground where we know which demonstrators generated each dataset, but make no assumptions about the underlying policies of the demonstrators. This is the most natural setting when collecting data from multiple human operators, yet remains unexplored. Since different demonstrators induce different data distributions, we show that this can be naturally framed as a domain generalization problem, with each demonstrator corresponding to a different domain. Specifically, we propose Domain-Invariant Model-based Offline RL (DIMORL), where we apply Risk Extrapolation (REx) (Krueger et al., 2020) to the process of learning dynamics and rewards models. Our results show that models trained with REx exhibit improved domain generalization performance when compared with the natural baseline of pooling all demonstrators' data. We observe that the resulting models frequently enable the learning of superior policies in the offline model-based RL setting, can improve the stability of the policy learning process, and potentially enable increased exploration.

READ FULL TEXT
research
06/16/2021

Behavioral Priors and Dynamics Models: Improving Performance and Domain Transfer in Offline RL

Offline Reinforcement Learning (RL) aims to extract near-optimal policie...
research
05/27/2020

MOPO: Model-based Offline Policy Optimization

Offline reinforcement learning (RL) refers to the problem of learning po...
research
06/03/2022

Offline Reinforcement Learning with Causal Structured World Models

Model-based methods have recently shown promising for offline reinforcem...
research
04/13/2021

Online and Offline Reinforcement Learning by Planning with a Learned Model

Learning efficiently from small amounts of data has long been the focus ...
research
11/08/2022

ARMOR: A Model-based Framework for Improving Arbitrary Baseline Policies with Offline Data

We propose a new model-based offline RL framework, called Adversarial Mo...
research
06/01/2022

Model Generation with Provable Coverability for Offline Reinforcement Learning

Model-based offline optimization with dynamics-aware policy provides a n...
research
10/13/2022

Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief

Model-based offline reinforcement learning (RL) aims to find highly rewa...

Please sign up or login with your details

Forgot password? Click here to reset