ENTROPY: Environment Transformer and Offline Policy Optimization

03/07/2023
by   Pengqin Wang, et al.
0

Model-based methods provide an effective approach to offline reinforcement learning (RL). They learn an environmental dynamics model from interaction experiences and then perform policy optimization based on the learned model. However, previous model-based offline RL methods lack long-term prediction capability, resulting in large errors when generating multi-step trajectories. We address this issue by developing a sequence modeling architecture, Environment Transformer, which can generate reliable long-horizon trajectories based on offline datasets. We then propose a novel model-based offline RL algorithm, ENTROPY, that learns the dynamics model and reward function by ENvironment TRansformer and performs Offline PolicY optimization. We evaluate the proposed method on MuJoCo continuous control RL environments. Results show that ENTROPY performs comparably or better than the state-of-the-art model-based and model-free offline RL methods and demonstrates more powerful long-term trajectory prediction capability compared to existing model-based offline methods.

READ FULL TEXT

page 1

page 4

page 5

research
06/16/2022

Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based Imagination

The learned policy of model-free offline reinforcement learning (RL) met...
research
05/15/2022

Reliable Offline Model-based Optimization for Industrial Process Control

In the research area of offline model-based optimization, novel and prom...
research
04/28/2021

Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization

Standard dynamics models for continuous control make use of feedforward ...
research
06/01/2023

IQL-TD-MPC: Implicit Q-Learning for Hierarchical Model Predictive Control

Model-based reinforcement learning (RL) has shown great promise due to i...
research
06/20/2023

Efficient Dynamics Modeling in Interactive Environments with Koopman Theory

The accurate modeling of dynamics in interactive environments is critica...
research
12/05/2022

Benchmarking Offline Reinforcement Learning Algorithms for E-Commerce Order Fraud Evaluation

Amazon and other e-commerce sites must employ mechanisms to protect thei...
research
02/13/2021

PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators

We consider offline reinforcement learning (RL) with heterogeneous agent...

Please sign up or login with your details

Forgot password? Click here to reset