Offline Policy Comparison with Confidence: Benchmarks and Baselines

05/22/2022
by   Anurag Koul, et al.
0

Decision makers often wish to use offline historical data to compare sequential-action policies at various world states. Importantly, computational tools should produce confidence values for such offline policy comparison (OPC) to account for statistical variance and limited data coverage. Nevertheless, there is little work that directly evaluates the quality of confidence values for OPC. In this work, we address this issue by creating benchmarks for OPC with Confidence (OPCC), derived by adding sets of policy comparison queries to datasets from offline reinforcement learning. In addition, we present an empirical evaluation of the risk versus coverage trade-off for a class of model-based baselines. In particular, the baselines learn ensembles of dynamics models, which are used in various ways to produce simulations for answering queries with confidence values. While our results suggest advantages for certain baseline variations, there appears to be significant room for improvement in future work.

READ FULL TEXT
research
11/08/2022

ARMOR: A Model-based Framework for Improving Arbitrary Baseline Policies with Offline Data

We propose a new model-based offline RL framework, called Adversarial Mo...
research
06/07/2021

Offline Policy Comparison under Limited Historical Agent-Environment Interactions

We address the challenge of policy evaluation in real-world applications...
research
06/05/2023

Survival Instinct in Offline Reinforcement Learning

We present a novel observation about the behavior of offline reinforceme...
research
08/28/2023

Statistically Efficient Variance Reduction with Double Policy Estimation for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning

Offline reinforcement learning aims to utilize datasets of previously ga...
research
06/10/2023

HIPODE: Enhancing Offline Reinforcement Learning with High-Quality Synthetic Data from a Policy-Decoupled Approach

Offline reinforcement learning (ORL) has gained attention as a means of ...
research
06/01/2022

Model Generation with Provable Coverability for Offline Reinforcement Learning

Model-based offline optimization with dynamics-aware policy provides a n...
research
10/22/2016

Automatic Identification of Sarcasm Target: An Introductory Approach

Past work in computational sarcasm deals primarily with sarcasm detectio...

Please sign up or login with your details

Forgot password? Click here to reset