A Unified Framework for Alternating Offline Model Training and Policy Learning

10/12/2022
by   Shentao Yang, et al.
0

In offline model-based reinforcement learning (offline MBRL), we learn a dynamic model from historically collected data, and subsequently utilize the learned model and fixed datasets for policy learning, without further interacting with the environment. Offline MBRL algorithms can improve the efficiency and stability of policy learning over the model-free algorithms. However, in most of the existing offline MBRL algorithms, the learning objectives for the dynamic models and the policies are isolated from each other. Such an objective mismatch may lead to inferior performance of the learned agents. In this paper, we address this issue by developing an iterative offline MBRL framework, where we maximize a lower bound of the true expected return, by alternating between dynamic-model training and policy learning. With the proposed unified model-policy learning framework, we achieve competitive performance on a wide range of continuous-control offline reinforcement learning datasets. Source code is publicly released.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/14/2022

Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning

Offline reinforcement learning (RL) extends the paradigm of classical RL...
research
01/26/2023

Model-based Offline Reinforcement Learning with Local Misspecification

We present a model-based offline reinforcement learning policy performan...
research
03/01/2023

The Virtues of Laziness in Model-based RL: A Unified Objective and Algorithms

We propose a novel approach to addressing two fundamental challenges in ...
research
08/11/2023

Learning Control Policies for Variable Objectives from Offline Data

Offline reinforcement learning provides a viable approach to obtain adva...
research
01/30/2023

Designing an offline reinforcement learning objective from scratch

Offline reinforcement learning has developed rapidly over the recent yea...
research
06/13/2020

Reinforcement Learning as Iterative and Amortised Inference

There are several ways to categorise reinforcement learning (RL) algorit...
research
10/08/2021

Revisiting Design Choices in Model-Based Offline Reinforcement Learning

Offline reinforcement learning enables agents to leverage large pre-coll...

Please sign up or login with your details

Forgot password? Click here to reset