Model-based Offline Policy Optimization with Adversarial Network

09/05/2023
by   Junming Yang, et al.
0

Model-based offline reinforcement learning (RL), which builds a supervised transition model with logging dataset to avoid costly interactions with the online environment, has been a promising approach for offline policy optimization. As the discrepancy between the logging data and online environment may result in a distributional shift problem, many prior works have studied how to build robust transition models conservatively and estimate the model uncertainty accurately. However, the over-conservatism can limit the exploration of the agent, and the uncertainty estimates may be unreliable. In this work, we propose a novel Model-based Offline policy optimization framework with Adversarial Network (MOAN). The key idea is to use adversarial learning to build a transition model with better generalization, where an adversary is introduced to distinguish between in-distribution and out-of-distribution samples. Moreover, the adversary can naturally provide a quantification of the model's uncertainty with theoretical guarantees. Extensive experiments showed that our approach outperforms existing state-of-the-art baselines on widely studied offline RL benchmarks. It can also generate diverse in-distribution samples, and quantify the uncertainty more accurately.

READ FULL TEXT

page 1

page 3

research
05/27/2020

MOPO: Model-based Offline Policy Optimization

Offline reinforcement learning (RL) refers to the problem of learning po...
research
02/22/2021

GELATO: Geometrically Enriched Latent Model for Offline Reinforcement Learning

Offline reinforcement learning approaches can generally be divided to pr...
research
09/16/2023

DOMAIN: MilDly COnservative Model-BAsed OfflINe Reinforcement Learning

Model-based reinforcement learning (RL), which learns environment model ...
research
04/26/2022

RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning

Offline reinforcement learning (RL) aims to find near-optimal policies f...
research
09/15/2021

DROMO: Distributionally Robust Offline Model-based Policy Optimization

We consider the problem of offline reinforcement learning with model-bas...
research
09/30/2022

S2P: State-conditioned Image Synthesis for Data Augmentation in Offline Reinforcement Learning

Offline reinforcement learning (Offline RL) suffers from the innate dist...
research
11/10/2019

A Model-Based Reinforcement Learning with Adversarial Training for Online Recommendation

Reinforcement learning is effective in optimizing policies for recommend...

Please sign up or login with your details

Forgot password? Click here to reset