Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization

04/28/2021
by   Michael R. Zhang, et al.
5

Standard dynamics models for continuous control make use of feedforward computation to predict the conditional distribution of next state and reward given current state and action using a multivariate Gaussian with a diagonal covariance structure. This modeling choice assumes that different dimensions of the next state and reward are conditionally independent given the current state and action and may be driven by the fact that fully observable physics-based simulation environments entail deterministic transition dynamics. In this paper, we challenge this conditional independence assumption and propose a family of expressive autoregressive dynamics models that generate different dimensions of the next state and reward sequentially conditioned on previous dimensions. We demonstrate that autoregressive dynamics models indeed outperform standard feedforward models in log-likelihood on heldout transitions. Furthermore, we compare different model-based and model-free off-policy evaluation (OPE) methods on RL Unplugged, a suite of offline MuJoCo datasets, and find that autoregressive dynamics models consistently outperform all baselines, achieving a new state-of-the-art. Finally, we show that autoregressive dynamics models are useful for offline policy optimization by serving as a way to enrich the replay buffer through data augmentation and improving performance using model-based planning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/07/2023

ENTROPY: Environment Transformer and Offline Policy Optimization

Model-based methods provide an effective approach to offline reinforceme...
research
03/13/2022

DARA: Dynamics-Aware Reward Augmentation in Offline Reinforcement Learning

Offline reinforcement learning algorithms promise to be applicable in se...
research
09/26/2022

Unifying Model-Based and Neural Network Feedforward: Physics-Guided Neural Networks with Linear Autoregressive Dynamics

Unknown nonlinear dynamics often limit the tracking performance of feedf...
research
11/03/2022

Contrastive Value Learning: Implicit Models for Simple Offline RL

Model-based reinforcement learning (RL) methods are appealing in the off...
research
06/06/2023

Vid2Act: Activate Offline Videos for Visual RL

Pretraining RL models on offline video datasets is a promising way to im...
research
05/15/2022

Reliable Offline Model-based Optimization for Industrial Process Control

In the research area of offline model-based optimization, novel and prom...
research
03/22/2017

LogitBoost autoregressive networks

Multivariate binary distributions can be decomposed into products of uni...

Please sign up or login with your details

Forgot password? Click here to reset