Adversarial Model for Offline Reinforcement Learning

02/21/2023
by   Mohak Bhardwaj, et al.
0

We propose a novel model-based offline Reinforcement Learning (RL) framework, called Adversarial Model for Offline Reinforcement Learning (ARMOR), which can robustly learn policies to improve upon an arbitrary reference policy regardless of data coverage. ARMOR is designed to optimize policies for the worst-case performance relative to the reference policy through adversarially training a Markov decision process model. In theory, we prove that ARMOR, with a well-tuned hyperparameter, can compete with the best policy within data coverage when the reference policy is supported by the data. At the same time, ARMOR is robust to hyperparameter choices: the policy learned by ARMOR, with "any" admissible hyperparameter, would never degrade the performance of the reference policy, even when the reference policy is not covered by the dataset. To validate these properties in practice, we design a scalable implementation of ARMOR, which by adversarial training, can optimize policies without using model ensembles in contrast to typical model-based methods. We show that ARMOR achieves competent performance with both state-of-the-art offline model-free and model-based RL algorithms and can robustly improve the reference policy over various hyperparameter choices.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/08/2022

ARMOR: A Model-based Framework for Improving Arbitrary Baseline Policies with Offline Data

We propose a new model-based offline RL framework, called Adversarial Mo...
research
05/21/2022

User-Interactive Offline Reinforcement Learning

Offline reinforcement learning algorithms still lack trust in practice d...
research
01/28/2023

Variational Latent Branching Model for Off-Policy Evaluation

Model-based methods have recently shown great potential for off-policy e...
research
11/10/2019

A Model-Based Reinforcement Learning with Adversarial Training for Online Recommendation

Reinforcement learning is effective in optimizing policies for recommend...
research
11/10/2019

Model-Based Reinforcement Learning with Adversarial Training for Online Recommendation

Reinforcement learning is effective in optimizing policies for recommend...
research
01/07/2022

A Theoretical Framework of Almost Hyperparameter-free Hyperparameter Selection Methods for Offline Policy Evaluation

We are concerned with the problem of hyperparameter selection of offline...
research
03/24/2022

Bellman Residual Orthogonalization for Offline Reinforcement Learning

We introduce a new reinforcement learning principle that approximates th...

Please sign up or login with your details

Forgot password? Click here to reset