ARMOR: A Model-based Framework for Improving Arbitrary Baseline Policies with Offline Data

11/08/2022
by   Tengyang Xie, et al.
0

We propose a new model-based offline RL framework, called Adversarial Models for Offline Reinforcement Learning (ARMOR), which can robustly learn policies to improve upon an arbitrary baseline policy regardless of data coverage. Based on the concept of relative pessimism, ARMOR is designed to optimize for the worst-case relative performance when facing uncertainty. In theory, we prove that the learned policy of ARMOR never degrades the performance of the baseline policy with any admissible hyperparameter, and can learn to compete with the best policy within data coverage when the hyperparameter is well tuned, and the baseline policy is supported by the data. Such a robust policy improvement property makes ARMOR especially suitable for building real-world learning systems, because in practice ensuring no performance degradation is imperative before considering any benefit learning can bring.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/21/2023

Adversarial Model for Offline Reinforcement Learning

We propose a novel model-based offline Reinforcement Learning (RL) frame...
research
05/22/2022

Offline Policy Comparison with Confidence: Benchmarks and Baselines

Decision makers often wish to use offline historical data to compare seq...
research
08/10/2022

Robust Reinforcement Learning using Offline Data

The goal of robust reinforcement learning (RL) is to learn a policy that...
research
11/27/2022

Domain Generalization for Robust Model-Based Offline Reinforcement Learning

Existing offline reinforcement learning (RL) algorithms typically assume...
research
05/21/2022

User-Interactive Offline Reinforcement Learning

Offline reinforcement learning algorithms still lack trust in practice d...
research
11/10/2019

Model-Based Reinforcement Learning with Adversarial Training for Online Recommendation

Reinforcement learning is effective in optimizing policies for recommend...
research
01/07/2022

A Theoretical Framework of Almost Hyperparameter-free Hyperparameter Selection Methods for Offline Policy Evaluation

We are concerned with the problem of hyperparameter selection of offline...

Please sign up or login with your details

Forgot password? Click here to reset